# Technology transfer from Department of Geoscience and Petroleum NTNU measured by number of ideas submitted to TTO
## Introduction
[Department of Geoscience and Petroleum](https://www.ntnu.edu/igp/department-of-geoscience-and-petroleum) of [Norwegian University of Science and Technology](https://www.ntnu.edu/) in Trondheim is an important supplier of technology and knowledge to petroleum and geology-related industries for many decades. Employees of the department transfer the technology to the industry by different mechanisms including project deliverables, dissertations, technology licencing, and spin-off companies (see figure below). In this notebook I cast some light on on the last two mechanisms - licensing and spin-offs. These tech transfer mechanisms are normally collaboration projects of the department and [TTO](https://www.ntnutto.no/home/) (Technology Transfer Office NTNU). The major role of TTO is to facilitate the projects including, among others, business development and evaluation of project ideas received from NTNU's employees. This notebook shows some statistics and simple analysis of the innovative project ideas that has been submitted to TTO in the last 14 years - between 2006 and 2019. It is based on a dataset provided by courtesy of TTO in December 2019 and supplied by additional information by the author of the notebook.

In [None]:
from IPython.display import Image
Image("../input/images/chart_tt.png", width=400)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import squarify
from wordcloud import WordCloud
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

In [None]:
# Path of the file to read
filepath = "../input/innovat/tto_public_comma.csv"

# Read the file into a variable
data = pd.read_csv(filepath)

## Dataset
The dataset consists of some details of the ideas submitted to TTO by the employees. The dataset consists of: project title (anonymized in this notebook), project number, submission year, department (here we look at the Department of Geoscience and Petroleum only), status of the idea and main inventor name (anonymized in this notebook). The dataset was supplemented by information about department's research and employee groups. Table below shows first lines of the dataset used for further result visualization and analysis.

In [None]:
data.head()

## Results and analysis: general trends 
### Number of ideas submitted to TTO each year
The first chart (bar plot) below shows number of ideas received by TTO from the department's employees (number of submissions) each year. Please note that in January 2017 the Department of Geoscience and Petroleum (IGP) was formed by a merger of 2 departments: Department of Petroleum Technology and Applied Geophysics (IPT) and Department of Geology and Mineral Resources Engineering (IGB). Therefore, the results before 2017 discriminate between 2 departments. The next chart (area plot) shows cummulation of the ideas received by TTO from the department. 

In [None]:
# Data frame of number of ideas per year from different deparments
df = pd.crosstab(data.Department, data.Year)

# swaping of rows
rows = df.index.tolist()
rows = rows[-1:]+rows[:-1]
df=df.loc[rows]

# Make a list of categories (years)
cats = df.columns.tolist()
# Make a list of values (number of ideas)
vals = df.values.tolist()
# Make a list of indexes (departments)
index = df.index.tolist()

# Plot
colors = ['#deebf7','#9ecae1','#3182bd']
plt.figure(figsize=(15,10))
t = 0
v = 0
for i in index:
    plt.bar(cats, vals[:][t], bottom=v, label=i, color=colors[t])
    t+=1
    v=np.sum(vals[:][0:t], 0)
plt.legend(loc='upper left')
plt.title('Ideas submitted to TTO each year')
plt.ylabel("Number of ideas")
plt.show()

#Total sum per column: 
df.loc['Total',:]= df.sum(axis=0)
#Total sum per row: 
df.loc[:,'Total'] = df.sum(axis=1)
print (df)

### What are the ideas about?
The ideas are related to core activity of the department: petroleum and geoscience. Each idea is however unique and answers a concrete need. The chart below shows frequency of the words that were used by inventors to describe their ideas. In order to keep idea names sceret, the idea names are not a part of this dataset. The chart was generated from an internal dataset that contain these information.

In [None]:
Image("../input/images/wordcloud.png", width=700)

### Number of ideas submitted by research groups
I show a plot of submitted ideas that distinguishes which research group were the ideas from or what industry area ideas belong to. The ideas were binned to 6 groups reflecting real structure of the department:

- Engineering - Engineering Geology and Rock Mechanics <br>
- Geology - Geology<br>
- Geophysics - Geophysics<br>
- Mineral - Mineral Production and HSE<br>
- Reservoir - Reservoir Engineering and Petrophysics<br>
- Well - Well Construction and Production Systems<br>

In [None]:
index, counts = np.unique(data.Group,return_counts=True)
#print(index)
#print(counts)

colors = ['#d73027','#fc8d59','#fee090','#ffffbf','#e0f3f8','#91bfdb']

plt.figure(figsize=(15,10))
squarify.plot(sizes=counts, label=index, color=colors)
plt.axis('off')
plt.title('Contribution of research groups to total number of ideas')
plt.show()


### Number of ideas submitted by employee groups

In [None]:
# what type employee group ideas come from?

index, counts = np.unique(data.Employee,return_counts=True)
#print(index)
#print(counts)

colors = ['#d73027','#fc8d59','#fee090','#ffffbf','#e0f3f8','#91bfdb','#4575b4']
plt.figure(figsize=(15,10))
squarify.plot(sizes=counts, label=index, color=colors)
plt.title('Contribution of employee groups to total number of ideas')
plt.axis('off')
plt.show()

### What is the status of ideas submitted to TTO?
What happend to the ideas after submission? Become they projects or were they finished? The charts below shows their status per 12/2019.

In [None]:
index, counts = np.unique(data.Status,return_counts=True)
#print(index)
#print(counts)

colors = ['#d7191c','#fdae61','#abd9e9','#2c7bb6']

plt.figure(figsize=(15,10))
squarify.plot(sizes=counts, label=index, color=colors)
plt.title('Status of all submitted ideas per in 12/2019')
plt.axis('off')
plt.show()

### Who does submit the ideas?
Inventors create and submit differnt amount of ideas to TTO. The diagram below shows the most active inventors from the department. Inventors that submitted more than 1 idea are included with the name, remaining ones are included in the *'others'* group. Real inventor names were anonimized.

In [None]:
index, counts = np.unique(data.Inventor,return_counts=True)
#print(index)
#print(counts)

# Group inventors that reported 1 idea to one group
g_index = []
g_counts = []
g_rest = 0

for i in range(len(index)):
    if counts[i]>1: 
        g_counts.append(counts[i])
        g_index.append(index[i])
    elif counts[i] == 1:
        g_rest += 1

g_counts.append(g_rest)
g_index.append('Others')    
 
# Create a circle for the center of the plot
colors = ['#d73027','#f46d43','#fdae61','#fee090','#e0f3f8','#abd9e9','#74add1','#4575b4']
my_circle=plt.Circle( (0,0), 0.9, color='white')
plt.figure(figsize=(15,10))
plt.pie(g_counts, labels=g_index, colors = colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('Contribution of employees to total number of ideas')
plt.show()

## Results and analysis: why tech transfer boosted in 2016?
The most intriguing thing is **what happened in 2016?** Before 2016 the department employees reported quite little ideas to TTO. It does not mean the department was not coming up with new technology - it did, but but the technology was mostly transferred by other mechanisms like deliverables through research projects which were mostly sponsored by the industry. Anyhow the average number of submitted ideas between 2006 and 2015 per year was exactly 1. This pattern was however broken at sometime in 2016. From 2016 on the number boosted to from 1.00 to 7.75. But why? Was it an effect of increased interest in innovation and entrepreneurship in Norway at this time? Was it due to less research projects? Less carrier opportunities for temporary staff? Oil crisis? Or something else?
<br><br>
Another observation is before 2015 all of the submitted ideas came from former Department of Geology and Mineral Resources Engineering (IGB) representing fields of engineering geology, mineral production, and geology. Before 2015 employees from former Department of Petroleum Technology and Applied Geophysics (IPT), representing areas of reservoir engineering, petrophysics, well construction and production, did not submit any ideas to TTO. In 2015 - 2016 IPT has waken up and started reporting more and more projects to TTO. But again why in 2016?



In [None]:
# Data frame of number of ideas per year from different deparments
df = pd.crosstab(data.Department, data.Year)

# swaping of rows
rows = df.index.tolist()
rows = rows[-1:]+rows[:-1]
df=df.loc[rows]

# Make a list of categories (years)
cats = df.columns.tolist()
# Make a list of values (number of ideas)
vals = df.values.tolist()
# Make a list of indexes (departments)
index = df.index.tolist()

# calculating total amount of ideas from "all departments"
ind = list(range(2006, 2020))
years = data['Year'].tolist() 
year_counts = []
for ele in ind: 
    year_counts.append(years.count(ele))
avg_pre2016 = np.average(year_counts[0:10])
avg_post2016 = np.average(year_counts[10:14])    

# Plot
colors = ['#deebf7','#9ecae1','#3182bd']
plt.figure(figsize=(15,10))
t = 0
v = 0
for i in index:
    plt.bar(cats, vals[:][t], bottom=v, label=i, color=colors[t])
    t+=1
    v=np.sum(vals[:][0:t], 0)
plt.plot([2005.5,2015.5],[avg_pre2016,avg_pre2016], color='#fc8d59')
plt.text(2009, avg_pre2016+0.1, 'average 2006-2015', fontsize=14, color='#fc8d59')
plt.plot([2015.5,2019.5],[avg_post2016,avg_post2016], color='#fc8d59')
plt.text(2016, avg_post2016+0.1, 'average 2016-2019', fontsize=14, color='#fc8d59')
plt.legend(loc='upper left')
plt.title('Ideas submitted to TTO each year')
plt.ylabel("Number of ideas")
plt.show()

#Total sum per column: 
df.loc['Total',:]= df.sum(axis=0)
#Total sum per row: 
df.loc[:,'Total'] = df.sum(axis=1)
print (df)

### Is number of ideas submitted to TTO related to popularity of innovation and entrepreneurship?
One way to measure popularity are google search statistics. I checked words: "entrepreneurship", "innovtion" and their Norwegian versions: "entreprenÃ¸rskap" and "innovasjon". I cannot really notice a clear relationship between google search results and number of ideas. maybe There is some increase of pupularity from about 2014 and 2015 but thats not very conclusive.

In [None]:
# Path of the file to read
filepath = "../input/innovation-in-norway-google-search/google.csv"

# Read the file into a variable
dfg = pd.read_csv(filepath)
dfg.date = pd.to_datetime(dfg.date)
dfg = dfg[dfg.date >= '01-01-2006']
# reset index, because we droped rows
dfg.reset_index(drop=True, inplace=True)


# curve fitting
x_plot = dfg.date

column_id = ['entreprenorskap', 'entrepreneurship', 'innovasjon','innovation']
for ele in column_id:
    x = dfg[ele].index
    y = dfg[ele]

    # calculate polynomial
    z = np.polyfit(x, y, 5)
    f = np.poly1d(z)

    # calculate new x's and y's
    x_new = x #this can be done at other x'es like: x_new = np.linspace(x[0], x[-1], 50)
    y_new = f(x_new)
    
    new_column_name = ele + '_fit'
    dfg[new_column_name] = y_new

In [None]:
# Normalization of the google search results
column_id = ['entreprenorskap_fit', 'entrepreneurship_fit', 'innovasjon_fit','innovation_fit']
for ele in column_id:
    dfg[ele] = (dfg[ele]-min(dfg[ele]))/(max(dfg[ele])-min(dfg[ele]))

In [None]:
years = pd.to_datetime(data.Year, format='%Y')
#bins = pd.to_datetime(list(range(2006,2020)), format='%Y') 

fig, ax1 = plt.subplots(figsize=(15,10))
plt.title('Number of ideas submitted to TTO each year vs innovation-related words google search popularity')
ax1.hist(years, bins = 14, range = ('2005-12-31','2020-01-01'), color='#deebf7', ec="k")
plt.ylabel('Number of ideas')
ax2 = ax1.twinx()

for ele in column_id:
    ax2.plot(dfg.date, dfg[ele], '-')
    plt.ylabel('Search popularity')
    ax2.legend()

### Is number of ideas submitted to TTO related to oil prices?
That's not clear. There is something going on from 2016 onwards - some kind of relationship. But this is not a case from 2009 to 2011 when we observe similar increase of oil prices from the bottom up.

In [None]:
# Path of the file to read
filepath = "../input/brent-oil-prices/BrentOilPrices.csv"
# Read the file into a variable
dfo = pd.read_csv(filepath)
dfo.Date = pd.to_datetime(dfo.Date)
dfo = dfo[dfo.Date >= '01-01-2006']
# reset index, because we droped rows
dfo.reset_index(drop=True, inplace=True)

In [None]:
fig, ax1 = plt.subplots(figsize=(15,10))
plt.title('Number of ideas submitted to TTO each year vs Brent oil price')
ax1.hist(years, bins = 14, range = ('2005-12-31','2020-01-01'), color='#deebf7', ec="k")
plt.ylabel('Number of ideas')
ax2 = ax1.twinx()
plt.ylabel('Oil price')
ax2.plot(dfo['Date'], dfo['Price'], '-', label='Oil price in $')
plt.legend(loc='upper left')
plt.show()

### Is increase in number of ideas dependent on research field?
To get better understanding of a phenomenon of year 2016 I show a cummulative area plot of submitted ideas that distinguishes which research group were the ideas from or what industry area ideas belong to. The ideas were binned to 6 groups reflecting real structure of the department:

- Engineering - Engineering Geology and Rock Mechanics <br>
- Geology - Geology<br>
- Geophysics - Geophysics<br>
- Mineral - Mineral Production and HSE<br>
- Reservoir - Reservoir Engineering and Petrophysics<br>
- Well - Well Construction and Production Systems<br>

Intrestingely, before 2016 submitted ideas were majorily delivered by 2 research groups: engineering and minearal production. Each group was submitting about 1 idea per 2 years in this period in a constant manner. From 2016 on these groups began submitting more ideas than in the previous period, but also other groups began reporting to TTO - geology, geophysics, reservoir and well - the research groups closely linked to the petroleum industry. 


In [None]:
# Reseach group vs time
df = pd.crosstab(data.Group, data.Year)
df = df.reindex(index = ['Engineering','Mineral','Geology','Geophysics','Reservoir','Well'])

# Make a list of categories (years)
cats = df.columns.tolist()
# Make a matrix of cumulative values (number of ideas)
#cum_vals = np.cumsum(df.values, axis=1)
# Make a list of values (number of ideas)
vals = df.values.tolist()
# Make a list of indexes (type of employees)
index = df.index.tolist()

# Plot
colors = ['#d73027','#fc8d59','#fee090','#e0f3f8','#91bfdb','#4575b4']
plt.figure(figsize=(15,10))
t = 0
v = 0
for i in index:
    plt.bar(cats, vals[:][t], bottom=v, label=i, color=colors[t])
    t+=1
    v=np.sum(vals[:][0:t], 0)
plt.legend(loc='upper left')
plt.title('Ideas submitted to TTO each year')
plt.ylabel("Number of ideas")
plt.show()

#Total sum per column: 
df.loc['Total',:]= df.sum(axis=0)
#Total sum per row: 
df.loc[:,'Total'] = df.sum(axis=1)
print (df)

So how does number of ideas look for oil-related research groups? Is it somehow related to oil prices? yes, somehow - these groups began submitting innovative ideas when the oil price was at the bottom - 2015 and 2016 (see chart below). This follows the general trend in the industry and related contractors when the oil price is low - amount of innovations helping with cost reduction and efficiency increases. In addition many research projects are not get funded because of industry cuts on research. Intrestingly, we cannot however observe onset of idea submission at the previous oil price drop - 2008 - 2009. Maybe at that time the idea of reporting innovations to TTO was not mature yet? 

In [None]:
oil_groups_years = data.Year[(data.Group == 'Reservoir') | (data.Group == 'Geology') | (data.Group == 'Well') | (data.Group == 'Geophysics')]
years = pd.to_datetime(oil_groups_years, format='%Y')

fig, ax1 = plt.subplots(figsize=(15,10))
plt.title('Number of ideas submitted to TTO from petroleum-related research groups each year vs Brent oil price')
ax1.hist(years, bins = 14, range = ('2005-12-31','2020-01-01'), color='#fee090', ec="k")
plt.ylabel('Number of ideas')
ax2 = ax1.twinx()
plt.ylabel('Oil price')
ax2.plot(dfo['Date'], dfo['Price'], '-', label='Oil price in $')
plt.legend(loc='upper left')
plt.show()

### Is number of ideas dependen on employee group?
I showed some trends when it comes to submitted ideas vs thematic/research groups. But what about employees themselves? Which group are the ideas comng from? Are they permanent or temporaty staff? Seasoned or new to the research? Maybe this might help explaining the phenomenon of 2016? The area chart below shows cummulative number of reported ideas per employee group. Before 2016 innovations were mostly submitted by professors with some input from researchers and an engineer. What happended in 2016 is the PhDs began reporting ideas and very quickly passed professors in the number of ideas. If we take out ideas from temporary staff then the number of ideas each year would fluctuate at the same level for the whole period 2006 - 2019. So why do we get this unanticipated increase of ideas from temporary staff? It might be speculated that these years are the beginnig of the innovation and entrepreurship focus in Norway (however not reflected by Google search interest showed before). More and more attention was paid at the university. Moreover, more resources were invested in education and support, especially for young researchers and students. But is there any other explanation?

In [None]:
# Employee type vs time
df = pd.crosstab(data.Employee, data.Year)
df = df.reindex(index = ['Professor','AssocProf','Engineer','Researcher','PostDoc','PhD', 'Student'])

# Make a list of categories (years)
cats = df.columns.tolist()
# Make a matrix of cumulative values (number of ideas)
#cum_vals = np.cumsum(df.values, axis=1)
# Make a list of values (number of ideas)
vals = df.values.tolist()
# Make a list of indexes (type of employees)
index = df.index.tolist()

# Plot
colors = ['#d73027','#fc8d59','#fee090','#ffffbf','#e0f3f8','#91bfdb','#4575b4']

plt.figure(figsize=(15,10))
t = 0
v = 0
for i in index:
    plt.bar(cats, vals[:][t], bottom=v, label=i, color=colors[t])
    t+=1
    v=np.sum(vals[:][0:t], 0)


plt.legend(index, loc='upper left')
plt.title('Number of ideas submitted to TTO since 2006 per employee group')
plt.ylabel('Number of ideas')
plt.show()

#Total sum per column: 
df.loc['Total',:]= df.sum(axis=0)
#Total sum per row: 
df.loc[:,'Total'] = df.sum(axis=1)
print (df)

Interestingly, young employees (+ a student) has began reporting technology to TTO when the oil price was at its bottom. This might also be expected because of lower number of jobs that traditionally are next carrier steps for young and temporary staff. When there are no jobs then setting up an own company seems a reasonabe carrier choice. The chart below shows number of ideas from temporary staff working in petroleum-ralated research groups. It also shows oil price and number of employees in the Norwegian petroleum sector. We may observe up to 4 ideas per year from the temporary staff after from 2016 on (see chart below). The amount of ideas coming from permanent staff working in oil-related subjects was smaller: usually 1-2 ideas per year (see a chart of total number of ideas coming from staff working with oil related subjects above). The oil crisis of 2015 and 2016 was definitely a trigger for temporary employees to start thinking about alternative carrier choices - in this case techology transfer of their own work. Interestingly, it did not happen at the previous oil crisis. The reason for it might be that oil crisis in 2008 - 2009 did not result in decrease in number of employees in the Norwegian petroleum sector as in 2015 - 2016 (see chart below).


In [None]:
# Path of the file to read
filepath = "../input/employment-in-petroleum-industry-in-norway/employment_petroleum_no.csv"
# Read the file into a variable
dfe = pd.read_csv(filepath)
dfe.Year = pd.to_datetime(dfe.Year, format='%Y')
dfe = dfe[dfe.Year >= '01-01-2006']
# addition of approx one year do the dates in a waythey represent values at the end of each year
dfe.Year= dfe.Year +  pd.to_timedelta(364, unit='d')
# reset index, because we droped rows
dfe.reset_index(drop=True, inplace=True)

# change from string to float data type in Total
dfe = dfe.astype({"Total":'float'}) 

In [None]:
young_groups_years = data.Year[((data.Employee == 'Student') | (data.Employee == 'PhD') | (data.Employee == 'PostDoc')) \
                               & ((data.Group == 'Reservoir') | (data.Group == 'Geology') | (data.Group == 'Well') | (data.Group == 'Geophysics'))]
years = pd.to_datetime(young_groups_years, format='%Y')

fig, ax1 = plt.subplots(figsize=(15,10))
plt.title('Number of ideas submitted to TTO from temporary staff working in peroleum-related research group each year vs oil price \
and number of employees in the Norwegian petroleum sector')
ax1.hist(years, bins = 14, range = ('2005-12-31','2020-01-01'), color='#fc8d59', ec="k")
plt.ylabel('Number of ideas')
ax2 = ax1.twinx()
plt.ylabel('Oil price in $ / Number of employees in k')
ax2.plot(dfo['Date'], dfo['Price'], '-', label = 'Brent oil price')
ax2.plot(dfe['Year'], dfe['Total'], '-', label = 'Employees in petro. sector', color = '#d73027')
plt.legend(loc='upper left')
plt.show()

## Conclusions: why tech transfer boosted in 2016?
Great increase of the number of ideas reported to TTO from 2016 was possibly related to the drop in oil prices in 2015 - 2016 and associated cost reductions in the industry. The drop was likely a trigger for researchers to pay more attention on technology transfer. This might be an efffect of less availabe research funds and less jobs in the petroleum industry. The oil crisis was a trigger for mostly temporary staff that is likely to be sensitive on the job marked. Few years after extremely low oil prices the amount of innovations from temporary staff is still high being an effect of either a pro-innovative minset change or uncertainty on the petroleum job marked.