# Simple Intro to Pandas

This tutrorial looks at the use of the popular python library Pandas. This is widely used to work with big datasets in ways which one can explore, clean, transformm and wrangle.  

**Tutorial Structure**
- [Preamble](#Preamble)
- [Import Data](#Import-Data)
 - [Creating DataFrames](#Creating-DataFrames)
 - [Read Files](#Read-Files)

# Preamble

In [1]:
%load_ext autoreload
%autoreload 2
# install im_tutorial package
!pip install git+https://github.com/nestauk/im_tutorials.git

Collecting git+https://github.com/nestauk/im_tutorials.git
  Cloning https://github.com/nestauk/im_tutorials.git to /tmp/pip-req-build-ifijq3jx
  Running command git clone -q https://github.com/nestauk/im_tutorials.git /tmp/pip-req-build-ifijq3jx
Building wheels for collected packages: im-tutorials
  Building wheel for im-tutorials (setup.py) ... [?25ldone
[?25h  Created wheel for im-tutorials: filename=im_tutorials-0.1.0-cp36-none-any.whl size=12596 sha256=92a0dae1b151a8fce917671609755fe96e3efad3feeda198d43e6edde962bd3c
  Stored in directory: /tmp/pip-ephem-wheel-cache-4hg8539_/wheels/47/a3/cb/bdc5f9ba49bcfd2c6864b166a1566eb2f104113bf0c3500330
Successfully built im-tutorials


In [2]:
# numpy for mathematical functions
import numpy as np
# pandas for handling tabular data
import pandas as pd
# explained later
from im_tutorials.data import cordis
import matplotlib.pyplot as plt

# Import Data

## Creating DataFrames

There are cases where you may hardcode for hacking-uses. This is one way to create a dataframe from scratch.

In [3]:
# useful for hacking
df_1 = pd.DataFrame(
    {'col1' : ['a', 'b', None,'c'],
    'col2' : ['d', 'e', 'f','g'],
    'col3' : [1, 2, 3, None],
    'col4' : [4, 5, 6, 7]}
)

In [None]:
df_1

## Read Files

In [None]:
#i if working from a local .csv file
df = pd.read_csv('file.csv')

In [35]:
# maybe use cordis
cordis_projects_df = cordis.h2020_projects()

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


# A Look at the Data
<br/>
It is almost protocol to look at what's inside your dataset before you start to answer questions. Pandas allows us to easily explore and draw up basic analysis using some of the libraries methods and functions.

Sometimes, we want a peek at what is going inside. The functions `.head()` and `.tail()` displays the top n rows or last n rows, respectively. Here, `n = 5` rows by default. You can adjust the number of rows by simply changing the number.

In [26]:
cordis_projects_df.head(n=3)

Unnamed: 0,rcn,id,acronym,status,programme,topics,frameworkProgramme,title,startDate,endDate,...,fundingScheme,coordinator,coordinatorCountry,participants,participantCountries,subjects,organisations,countries,startYear,endYear
0,218249,822106,WeldGalaxy,SIGNED,"[H2020-EU.2.1.3., H2020-EU.2.1.5.1.]",DT-NMBP-20-2018,H2020,Digital Dynamic Knowledge Platform for Welding...,2018-10-01,2022-03-31,...,IA-LS,TWI LIMITED,UK,"[LULEA TEKNISKA UNIVERSITET, ROMSOFT SRL, TECH...","[SE, RO, UK, NO, PL, FR, IL, ES]",,"[TWI LIMITED, LULEA TEKNISKA UNIVERSITET, ROMS...","[UK, SE, RO, UK, NO, PL, FR, IL, ES]",2018.0,2022.0
1,218272,822064,MARKET4.0,SIGNED,"[H2020-EU.2.1.3., H2020-EU.2.1.5.1.]",DT-NMBP-20-2018,H2020,A Multi-Sided Business Platform for Plug and P...,2018-11-01,2022-04-30,...,IA-LS,INTRASOFT INTERNATIONAL SA,BE,"[N.BAZIGOS ABEE, NEDERLANDSE ORGANISATIE VOOR ...","[EL, NL, ES, DE, IT, FR, AT, LU, LT]",,"[INTRASOFT INTERNATIONAL SA, N.BAZIGOS ABEE, N...","[BE, EL, NL, ES, DE, IT, FR, AT, LU, LT]",2018.0,2022.0
2,216937,816551,SCOPIO,SIGNED,"[H2020-EU.3., H2020-EU.2.3., H2020-EU.2.1.]",EIC-SMEInst-2018-2020,H2020,"High-resolution, all-digital microscope that a...",2018-05-01,2018-08-31,...,SME-1,SCOPIO LABS LTD,IL,,,,[SCOPIO LABS LTD],[IL],2018.0,2018.0


In [None]:
cordis_projects_df.tail(n=3)

Dtataframes consist of indices

In [None]:
cordis_projects_df.index

In [None]:
cordis_projects_df.columns

There are cases where you may want to apply some calculations across rows or columns. This can be appoached by using the `axis` of the dataframe. This can be accessed using the `axis` parameter found in many methods (many are `axis = 0` but default).
- Axis 0- apply on all rows across each column
- Axis 1- apply on all columns across each row

<img src="../reports/figures/axis.png">


In [None]:
cordis_projects_df.dtypes
#panadas way of saying there's non-numerical data in the column

In [None]:
cordis_projects_df.index

In [5]:
cordis_projects_df.columns

Index(['rcn', 'id', 'acronym', 'status', 'programme', 'topics',
       'frameworkProgramme', 'title', 'startDate', 'endDate', 'projectUrl',
       'objective', 'totalCost', 'ecMaxContribution', 'call', 'fundingScheme',
       'coordinator', 'coordinatorCountry', 'participants',
       'participantCountries', 'subjects', 'organisations', 'countries',
       'startYear', 'endYear'],
      dtype='object')

In [None]:
#can look at columns separtely 
cordis_projects_df['subjects']

In [None]:
#list of columns
cordis_projects_df[['status', 'subjects']]

In [None]:
cordis_projects_df['topics'].value_counts()

## Maths & Summaries
<br/>
Many 

In [None]:
df.sum()

In [None]:
# we can also use axis here 
df.sum(axis=1)

In [None]:
# count of number of elements present in column
df.count()

In [None]:
# or if you want the result of one column (can do this for any method)
df['col3'].count()

In [None]:
# only on numerical columns
df.describe()

In [None]:
#or can separately get this results
df.mean()

Now try other methods such as `.min()`, .max(), .median(), .var() and .std().

In [None]:
# write code here

In [None]:
#can add across columns

df['col3'] + df['col4']

Now try add two columns that have different datatypes and see what happens


In [None]:
# write code here

# Filtering & Subsets

In [None]:
# getting the subset of where the condition is true 
cordis_projects_df[cordis_projects_df['coordinatorCountry'] == 'UK'].head()

In [15]:
#instead of this, can use .loc 

cordis_projects_df.loc[cordis_projects_df['coordinatorCountry'] == 'UK']

Unnamed: 0,rcn,id,acronym,status,programme,topics,frameworkProgramme,title,startDate,endDate,...,fundingScheme,coordinator,coordinatorCountry,participants,participantCountries,subjects,organisations,countries,startYear,endYear
0,218249,822106,WeldGalaxy,SIGNED,"[H2020-EU.2.1.3., H2020-EU.2.1.5.1.]",DT-NMBP-20-2018,H2020,Digital Dynamic Knowledge Platform for Welding...,2018-10-01,2022-03-31,...,IA-LS,TWI LIMITED,UK,"[LULEA TEKNISKA UNIVERSITET, ROMSOFT SRL, TECH...","[SE, RO, UK, NO, PL, FR, IL, ES]",,"[TWI LIMITED, LULEA TEKNISKA UNIVERSITET, ROMS...","[UK, SE, RO, UK, NO, PL, FR, IL, ES]",2018.0,2022.0
22,210626,755667,BreathSpec,SIGNED,"[H2020-EU.3., H2020-EU.2.]",FTIPilot-01-2016,H2020,"A rapid, non-invasive, cost-effective, analyti...",2017-05-01,2019-04-30,...,IA,IMSPEX DIAGNOSTICS LIMITED,UK,"[REDKNIGHT CONSULTANCY LTD, THE UNIVERSITY OF ...","[UK, IE, DE]",,"[IMSPEX DIAGNOSTICS LIMITED, REDKNIGHT CONSULT...","[UK, UK, IE, DE]",2017.0,2019.0
24,208925,747461,Robust OTFT sensors,SIGNED,[H2020-EU.1.3.2.],MSCA-IF-2016,H2020,"Ultra-robust, flexible organic sensors for app...",2017-07-01,2019-06-30,...,MSCA-IF-GF,THE CHANCELLOR MASTERS AND SCHOLARS OF THE UNI...,UK,,,,[THE CHANCELLOR MASTERS AND SCHOLARS OF THE UN...,[UK],2017.0,2019.0
28,208684,750294,NAPANODE,TERMINATED,[H2020-EU.1.3.2.],MSCA-IF-2016,H2020,Molecular Foundation of Structural and Dynamic...,2017-03-01,2019-02-28,...,MSCA-IF-EF-ST,THE CHANCELLOR MASTERS AND SCHOLARS OF THE UNI...,UK,,,,[THE CHANCELLOR MASTERS AND SCHOLARS OF THE UN...,[UK],2017.0,2019.0
30,208574,745808,BCSC-ST,TERMINATED,[H2020-EU.1.3.2.],MSCA-IF-2016,H2020,Breast cancer stem-like cells specific vulnera...,2017-03-01,2019-02-28,...,MSCA-IF-EF-RI,UNIVERSITY OF NEWCASTLE UPON TYNE,UK,,,,[UNIVERSITY OF NEWCASTLE UPON TYNE],[UK],2017.0,2019.0
36,208532,739795,ASSIGN,SIGNED,[H2020-EU.2.3.2.2.],INNOSUP-02-2016,H2020,Get on top of your multimedia content,2017-09-01,2018-08-31,...,CSA,IN2 SEARCH INTERFACES DEVELOPMENT LIMITED,UK,,,,[IN2 SEARCH INTERFACES DEVELOPMENT LIMITED],[UK],2017.0,2018.0
39,207376,714427,INNOVATION,SIGNED,[H2020-EU.1.1.],ERC-2016-STG,H2020,Authority and Innovation in Early Franciscan T...,2017-01-01,2021-12-31,...,ERC-STG,KING'S COLLEGE LONDON,UK,[CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE ...,[FR],,"[KING'S COLLEGE LONDON, CENTRE NATIONAL DE LA ...","[UK, FR]",2017.0,2021.0
50,206181,732194,QROWD,SIGNED,[H2020-EU.2.1.1.],ICT-14-2016-2017,H2020,QROWD - Because Big Data Integration is Humanl...,2016-12-01,2019-11-30,...,IA,UNIVERSITY OF SOUTHAMPTON,UK,"[COMUNE DI TRENTO, INMARK EUROPA SA, TOMTOM DE...","[IT, ES, DE, CH]",,"[UNIVERSITY OF SOUTHAMPTON, COMUNE DI TRENTO, ...","[UK, IT, ES, DE, CH]",2016.0,2019.0
59,206125,705600,MultimodalCellTrack,SIGNED,[H2020-EU.1.3.2.],MSCA-IF-2015-EF,H2020,Multimodal preclinical imaging probes to evalu...,2016-08-01,2018-07-31,...,MSCA-IF-EF-ST,THE UNIVERSITY OF LIVERPOOL,UK,,,,[THE UNIVERSITY OF LIVERPOOL],[UK],2016.0,2018.0
66,206460,704382,BeeSymOverSpace,SIGNED,[H2020-EU.1.3.2.],MSCA-IF-2015-GF,H2020,How to help the hive? Incidence and impact of ...,2016-10-01,2019-05-31,...,MSCA-IF-GF,THE UNIVERSITY OF LIVERPOOL,UK,,,,[THE UNIVERSITY OF LIVERPOOL],[UK],2016.0,2019.0


In [24]:
# loc iloc
#up to top 4 rows (shows rows 0,1,2,3)
cordis_projects_df.iloc[:4]

#up to index '4' (shows rows 0,1,2,3,4)
cordis_projects_df.loc[:4]

#subset of is null/not null



Unnamed: 0,rcn,id,acronym,status,programme,topics,frameworkProgramme,title,startDate,endDate,...,fundingScheme,coordinator,coordinatorCountry,participants,participantCountries,subjects,organisations,countries,startYear,endYear
0,218249,822106,WeldGalaxy,SIGNED,"[H2020-EU.2.1.3., H2020-EU.2.1.5.1.]",DT-NMBP-20-2018,H2020,Digital Dynamic Knowledge Platform for Welding...,2018-10-01,2022-03-31,...,IA-LS,TWI LIMITED,UK,"[LULEA TEKNISKA UNIVERSITET, ROMSOFT SRL, TECH...","[SE, RO, UK, NO, PL, FR, IL, ES]",,"[TWI LIMITED, LULEA TEKNISKA UNIVERSITET, ROMS...","[UK, SE, RO, UK, NO, PL, FR, IL, ES]",2018.0,2022.0
1,218272,822064,MARKET4.0,SIGNED,"[H2020-EU.2.1.3., H2020-EU.2.1.5.1.]",DT-NMBP-20-2018,H2020,A Multi-Sided Business Platform for Plug and P...,2018-11-01,2022-04-30,...,IA-LS,INTRASOFT INTERNATIONAL SA,BE,"[N.BAZIGOS ABEE, NEDERLANDSE ORGANISATIE VOOR ...","[EL, NL, ES, DE, IT, FR, AT, LU, LT]",,"[INTRASOFT INTERNATIONAL SA, N.BAZIGOS ABEE, N...","[BE, EL, NL, ES, DE, IT, FR, AT, LU, LT]",2018.0,2022.0
2,216937,816551,SCOPIO,SIGNED,"[H2020-EU.3., H2020-EU.2.3., H2020-EU.2.1.]",EIC-SMEInst-2018-2020,H2020,"High-resolution, all-digital microscope that a...",2018-05-01,2018-08-31,...,SME-1,SCOPIO LABS LTD,IL,,,,[SCOPIO LABS LTD],[IL],2018.0,2018.0
3,218836,811181,MapProdIGI,SIGNED,[H2020-EU.3.6.2.1.],IBA-SC6-Nexus-2017,H2020,Microdata analysis for Policies for Productivi...,2018-07-01,2021-06-30,...,CSA,ORGANISATION FOR ECONOMIC CO-OPERATION AND DEV...,FR,,,,[ORGANISATION FOR ECONOMIC CO-OPERATION AND DE...,[FR],2018.0,2021.0
4,218837,814729,SSH Impact,SIGNED,[H2020-EU.3.6.2.1.],IBA-SC6-AUSTRIA-2018,H2020,Conference on the âImpact of Social Sciences...,2018-03-01,2019-02-28,...,CSA,ZENTRUM FUR SOZIALE INNOVATION GMBH,AT,,,,[ZENTRUM FUR SOZIALE INNOVATION GMBH],[AT],2018.0,2019.0


In [25]:
# cordis_projects_df

In [None]:
# loc based on the name of labels in index
# iloc based on position in index

# Data Wrangling

## Cleaning

### Dropping Data

In [None]:
# drop columns (a list of column names or single!)
df.drop(columns, inplace=True, axis=1)

In [None]:
# drop rows 
df.drop(rows, inplace=True, axis=0)

## Handling Missing Data

Descriptions

In [None]:
# another way to drop columns where all elements are nan
#use toy here
df.dropna(axis = 1, how='any')

In [None]:
# another way to drop rows where all elements are nan; here axis =0 by default
df.dropna(how='any')

There are other `how` parameter options. See what happens when `how` equals `all`

Drop duplicates (rows)

In [None]:
df.drop_duplicates()

#can drop duplicates based on one column 
df.drop_duplicates(subset="col", inplace=True)

filter(using conditons)

## Rename & Resets

In [None]:
# renaming columns
# reset index

## Tranformation

In [None]:
# add new columns
cordis_projects_df['half_totalCost'] = cordis_projects_df['totalCost'] * 0.5
#group data & apply a function & mergeb
#refer back to adat types and how we change datatype of column

In [None]:
#transpose data

df.T

### Mapping

In [None]:
#dcitionary

In [None]:
# 

### GroupBy

In [None]:
#preferable group by categorical -type column
grouped_df = cordis_projects_df.groupby(by=['status', 'ecMaxContribution'])

Here, we can investigate statistical results of each numerical column based on the groups defined by applying the methods form earlier.

In [None]:
grouped_df.mean()

In [None]:
grouped_df.sum()

In [None]:
# can replace values 
#avoid lambda- use numpy funtion instead
grouped_df['totalCost'].apply(lambda x: x/x.count()) #divided by the count of the grouped

In [None]:
cordis_projects_df.head()

In [None]:
# can groupby on different levels and cal on diff levels 

In [None]:
#careful cos some columns may have integers but are IDs 
#so up to you to make decision what to do with it 

# Working with DateTime
<br />
In a lot of cases, datetime information are stored as strings. Thankfully, pandas can deal with this. 

# Combining Data

Firtsly, let's define another toy example. 

In [27]:
df_2= pd.DataFrame(
    {
        'col1': ['a', 'b', 1, 2], 
        'col2': ['d', 'e', 'f', 'k'], 
        'col3': [5, 6, 7, 8],
        'col4': ['h', 'i', 'j', 'k']
    }

)

In [28]:
df_1

Unnamed: 0,col1,col2,col3,col4
0,a,d,1.0,4
1,b,e,2.0,5
2,,f,3.0,6
3,c,g,,7


In [29]:
df_2

Unnamed: 0,col1,col2,col3,col4
0,a,d,5,h
1,b,e,6,i
2,1,f,7,j
3,2,k,8,k


### Merge

In Pandas, there are various ways to merge: `left`, `right`, `inner`, `outer`. Here, we have to specify 

In [None]:
# merge - `left` specifies the first is the main dataframe and the other is mergeing with it 
# note, the columns must be the same in order for a smooth merge
#note the col chosen of the chosen df of is constant and everything is depent on that (Whether it includes the lement in it's respective row)
# pd.merge(
#     df_1,
#     df_2,
#     how='left', 
#     on= 'col2'
# )



In [30]:
pd.merge(
    df_1,
    df_2,
    how='right', 
    on= 'col2'
)

Unnamed: 0,col1_x,col2,col3_x,col4_x,col1_y,col3_y,col4_y
0,a,d,1.0,4.0,a,5,h
1,b,e,2.0,5.0,b,6,i
2,,f,3.0,6.0,1,7,j
3,,k,,,2,8,k


In [None]:
#intersection/elements of col2 which appears in both cols 
pd.merge(
    df_1,
    df_2,
    how='inner', 
    on= 'col2'
)

In [None]:
#union 
pd.merge(
    df_1,
    df_2,
    how='outer', 
    on= 'col2'
)

#left_on right_on (column with diff names)

In [31]:
# join- works similarly but automatically recognises where/ dont need to define

df_1.join(df_2, 'col2', how='right')

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

In [None]:
#concatenate- in a way like stacking 
#Used to append two or more dataframes on-top or sideways 
pd.concat([df_1,df_2]) #default axis= 0

In [None]:
pd.concat([df_1,df_2], axis=1)# where it's concatenated in ord on 

# Working with...

Here, we will look at different non-numeric data type and how we can work with them in pandas.

## DateTime
<br />
In a lot of cases, datetime information are stored as strings. Thankfully, pandas can deal with this. These strings are transformed to datetime objects.

In [32]:
#in this dataset, these dates are already datetime objects 
#write example
cordis_projects_df['startDate'] = pd.to_datetime(cordis_projects_df['startDate'])
cordis_projects_df

Unnamed: 0,rcn,id,acronym,status,programme,topics,frameworkProgramme,title,startDate,endDate,...,fundingScheme,coordinator,coordinatorCountry,participants,participantCountries,subjects,organisations,countries,startYear,endYear
0,218249,822106,WeldGalaxy,SIGNED,"[H2020-EU.2.1.3., H2020-EU.2.1.5.1.]",DT-NMBP-20-2018,H2020,Digital Dynamic Knowledge Platform for Welding...,2018-10-01,2022-03-31,...,IA-LS,TWI LIMITED,UK,"[LULEA TEKNISKA UNIVERSITET, ROMSOFT SRL, TECH...","[SE, RO, UK, NO, PL, FR, IL, ES]",,"[TWI LIMITED, LULEA TEKNISKA UNIVERSITET, ROMS...","[UK, SE, RO, UK, NO, PL, FR, IL, ES]",2018.0,2022.0
1,218272,822064,MARKET4.0,SIGNED,"[H2020-EU.2.1.3., H2020-EU.2.1.5.1.]",DT-NMBP-20-2018,H2020,A Multi-Sided Business Platform for Plug and P...,2018-11-01,2022-04-30,...,IA-LS,INTRASOFT INTERNATIONAL SA,BE,"[N.BAZIGOS ABEE, NEDERLANDSE ORGANISATIE VOOR ...","[EL, NL, ES, DE, IT, FR, AT, LU, LT]",,"[INTRASOFT INTERNATIONAL SA, N.BAZIGOS ABEE, N...","[BE, EL, NL, ES, DE, IT, FR, AT, LU, LT]",2018.0,2022.0
2,216937,816551,SCOPIO,SIGNED,"[H2020-EU.3., H2020-EU.2.3., H2020-EU.2.1.]",EIC-SMEInst-2018-2020,H2020,"High-resolution, all-digital microscope that a...",2018-05-01,2018-08-31,...,SME-1,SCOPIO LABS LTD,IL,,,,[SCOPIO LABS LTD],[IL],2018.0,2018.0
3,218836,811181,MapProdIGI,SIGNED,[H2020-EU.3.6.2.1.],IBA-SC6-Nexus-2017,H2020,Microdata analysis for Policies for Productivi...,2018-07-01,2021-06-30,...,CSA,ORGANISATION FOR ECONOMIC CO-OPERATION AND DEV...,FR,,,,[ORGANISATION FOR ECONOMIC CO-OPERATION AND DE...,[FR],2018.0,2021.0
4,218837,814729,SSH Impact,SIGNED,[H2020-EU.3.6.2.1.],IBA-SC6-AUSTRIA-2018,H2020,Conference on the âImpact of Social Sciences...,2018-03-01,2019-02-28,...,CSA,ZENTRUM FUR SOZIALE INNOVATION GMBH,AT,,,,[ZENTRUM FUR SOZIALE INNOVATION GMBH],[AT],2018.0,2019.0
5,218833,811163,GLORIA,SIGNED,[H2020-EU.3.6.2.1.],IBA-SC6-Industrial-2017,H2020,Global Industrial Research & Innovation Analyses,2018-06-01,2020-11-30,...,CSA,JRC -JOINT RESEARCH CENTRE- EUROPEAN COMMISSION,BE,,,,[JRC -JOINT RESEARCH CENTRE- EUROPEAN COMMISSION],[BE],2018.0,2020.0
6,217710,816708,Biopsy X,SIGNED,"[H2020-EU.3., H2020-EU.2.3., H2020-EU.2.1.]",EIC-SMEInst-2018-2020,H2020,EndodrillÂ® Model X - a new endoscopic biopsy ...,2018-03-01,2018-08-31,...,SME-1,BIBBINSTRUMENTS AB,SE,,,,[BIBBINSTRUMENTS AB],[SE],2018.0,2018.0
7,217626,815751,SIWI,SIGNED,"[H2020-EU.3., H2020-EU.2.3., H2020-EU.2.1.]",EIC-SMEInst-2018-2020,H2020,Automatic Hitching for total Safety of Farmers...,2018-05-01,2018-08-31,...,SME-1,SIWI MASKINER APS,DK,,,,[SIWI MASKINER APS],[DK],2018.0,2018.0
8,211066,761508,5GCITY,SIGNED,[H2020-EU.2.1.1.],ICT-08-2017,H2020,5GCITY,2017-06-01,2019-11-30,...,IA,"FUNDACIO PRIVADA I2CAT, INTERNET I INNOVACIO D...",ES,[COMUNICARE DIGITALE - ASSOCIAZIONEDI PROMOZIO...,"[IT, ES, FR, BE, DE, UK, LU, PT]",,"[FUNDACIO PRIVADA I2CAT, INTERNET I INNOVACIO ...","[ES, IT, ES, FR, BE, DE, UK, LU, PT]",2017.0,2019.0
9,215832,801051,EPEEC,SIGNED,[H2020-EU.1.2.2.],FETHPC-02-2017,H2020,European joint Effort toward a Highly Producti...,2018-10-01,2021-09-30,...,RIA,BARCELONA SUPERCOMPUTING CENTER - CENTRO NACIO...,ES,[INSTITUT NATIONAL DE RECHERCHE ENINFORMATIQUE...,"[FR, SE, DE, BE, ES, PT, IT]",,[BARCELONA SUPERCOMPUTING CENTER - CENTRO NACI...,"[ES, FR, SE, DE, BE, ES, PT, IT]",2018.0,2021.0


In [36]:
# for example, the first row 
print(cordis_projects_df['startDate'][0].year)
print(cordis_projects_df['startDate'][0].month)
print(cordis_projects_df['startDate'][0].day)

2018
10
1


In [34]:
#use lambda to apply a method to each element row-wise if want to apply 
#can add 
cordis_projects_df['month'] = cordis_projects_df['startDate'].dt.month
cordis_projects_df['month'].head()

0    10.0
1    11.0
2     5.0
3     7.0
4     3.0
Name: month, dtype: float64

## Strings

In [None]:
# can convert any column elements to string type .str

#.str.len()to get length of each string across 0

#.str.replace() 

# Plotting

## Pandas Plotting
https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html

In [None]:
#histogram plot - all pandas plota are matplotlib figs but are specifically pandas functions
cordis_projects_df['month'].hist()
# plt.show()

In [None]:
cordis_projects_df.columns

In [None]:
cordis_projects_df['totalCost'].plot(kind='bar')

In [None]:
# grouped_df.size().unstack().plot(kind= 'bar', stacked=True)

#show few examples directly from df 

## Plotting using MatplotLib

https://matplotlib.org/3.1.1/contents.html

In [None]:
#to show they can be thrown into 
# one example with matplotlib

# Available Datasets

How to read these