# Intro to Pandas

[Data Science Handbook (with notebooks!)](https://jakevdp.github.io/PythonDataScienceHandbook/)

[Basics of Pandas](https://towardsdatascience.com/6-basic-pandas-techniques-you-need-to-know-2c5725746938)

[Pandas cheat sheet](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwjajKXO09DlAhWKqIsKHRK1Ab4QFjAAegQIARAC&url=https%3A%2F%2Fpandas.pydata.org%2FPandas_Cheat_Sheet.pdf&usg=AOvVaw2Z0H-ttrFe-41ta-Cnkf55)

[Good about rows and columns](https://www.geeksforgeeks.org/dealing-with-rows-and-columns-in-pandas-dataframe/)

Pandas is a python library for data science, data manipulation and data analysis. A Pandas *DataFrame* is a table with rows and columns. There is typically one data point per row and several features (columns) for each data point.

In [1]:
import pandas as pd
from sklearn import datasets

## Converting from format X to DataFrame

In [2]:
#List of numbers to DataFrame:

num_list = [1,2,3,4,5]
df = pd.DataFrame(num_list)
df

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [3]:
#List of tuples to DataFrame:

num_list = [(1,2),(3,4),(5,3)]
df = pd.DataFrame(num_list)
df

Unnamed: 0,0,1
0,1,2
1,3,4
2,5,3


In [4]:
#Dictionary to DataFrame:

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
pd.DataFrame.from_dict(data)

Unnamed: 0,col_1,col_2
0,3,a
1,2,b
2,1,c
3,0,d


In [5]:
#Text file (with spaces) to DataFrame:

df = pd.read_fwf('../datasets/primitivo.txt')
df

Unnamed: 0,x y
0,10.0 8.04
1,8.0 6.95
2,13.0 7.58
3,9.0 8.81
4,11.0 8.33
5,14.0 9.96
6,6.0 7.24
7,4.0 4.26
8,12.0 10.84
9,7.0 4.82


In [6]:
#Excel file to DataFrame:

df= pd.read_excel('../datasets/person.xlsx')
df

Unnamed: 0,Name,Age,Gender
0,Siri,15,f
1,Laura,6,f
2,Oscar,5,m


In [7]:
#Csv file to DataFrame:

df = pd.read_csv('../datasets/GDP-2015.csv')
df

Unnamed: 0,Entity,Code,Year,GDP per capita
0,Afghanistan,AFG,2015,1928
1,Albania,ALB,2015,10947
2,Algeria,DZA,2015,13024
3,Angola,AGO,2015,8631
4,Argentina,ARG,2015,19316
...,...,...,...,...
163,Vietnam,VNM,2015,5733
164,World,OWID_WRL,2015,14500
165,Yemen,YEM,2015,2496
166,Zambia,ZMB,2015,3537


In [8]:
iris = datasets.load_iris()
type(iris)

sklearn.utils.Bunch

In [9]:
#scikit files to DataFrame:

iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
iris_df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


## Looking at DataFrames

In [10]:
df.head()#first 5 rows

Unnamed: 0,Entity,Code,Year,GDP per capita
0,Afghanistan,AFG,2015,1928
1,Albania,ALB,2015,10947
2,Algeria,DZA,2015,13024
3,Angola,AGO,2015,8631
4,Argentina,ARG,2015,19316


In [11]:
df.tail(3)  #last 3 rows

Unnamed: 0,Entity,Code,Year,GDP per capita
165,Yemen,YEM,2015,2496
166,Zambia,ZMB,2015,3537
167,Zimbabwe,ZWE,2015,1759


In [12]:
df.describe()

Unnamed: 0,Year,GDP per capita
count,168.0,168.0
mean,2015.0,18194.47619
std,0.0,19249.613433
min,2015.0,605.0
25%,2015.0,3714.0
50%,2015.0,11794.0
75%,2015.0,25816.5
max,2015.0,139542.0


## Working with DataFrames

Grab a column:

In [13]:
df.columns

Index(['Entity', 'Code', 'Year', 'GDP per capita'], dtype='object')

In [14]:
countries = df['Entity']
countries

0      Afghanistan
1          Albania
2          Algeria
3           Angola
4        Argentina
          ...     
163        Vietnam
164          World
165          Yemen
166         Zambia
167       Zimbabwe
Name: Entity, Length: 168, dtype: object

In [15]:
gdp = df['GDP per capita']
gdp

0       1928
1      10947
2      13024
3       8631
4      19316
       ...  
163     5733
164    14500
165     2496
166     3537
167     1759
Name: GDP per capita, Length: 168, dtype: int64

Grab an entry:

In [16]:
gdp = df['GDP per capita']
gdpAngola = df['GDP per capita'][3]
gdpAngola

8631

## Small example

In [17]:
persons= pd.read_excel('../datasets/person.xlsx')
persons['Name']

0     Siri
1    Laura
2    Oscar
Name: Name, dtype: object

Add a column:

In [18]:
persons['HasBike'] = True
persons.head()

Unnamed: 0,Name,Age,Gender,HasBike
0,Siri,15,f,True
1,Laura,6,f,True
2,Oscar,5,m,True


Save changes to a file:

In [19]:
persons.to_excel('../datasets/person_new.xlsx')

See that now there is a file called person_new.xlsx in the directory dataset.