# Intro to Pandas

[Data Science Handbook (with notebooks!)](https://jakevdp.github.io/PythonDataScienceHandbook/)

[Basics of Pandas](https://towardsdatascience.com/6-basic-pandas-techniques-you-need-to-know-2c5725746938)

[Pandas cheat sheet](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwjajKXO09DlAhWKqIsKHRK1Ab4QFjAAegQIARAC&url=https%3A%2F%2Fpandas.pydata.org%2FPandas_Cheat_Sheet.pdf&usg=AOvVaw2Z0H-ttrFe-41ta-Cnkf55)

Pandas is a python library for data science, data manipulation and data analysis. A Pandas *DataFrame* is a table with rows and columns. There is typically one data point per row and several features (columns) for each data point.

In [19]:
import pandas as pd

## Converting from format X to DataFrame

In [2]:
#List of numbers to DataFrame:

num_list = [1,2,3,4,5]
df = pd.DataFrame(num_list)
print(df)

   0
0  1
1  2
2  3
3  4
4  5


In [3]:
#List of tuples to DataFrame:

num_list = [(1,2),(3,4),(5,3)]
df = pd.DataFrame(num_list)
print(df)

   0  1
0  1  2
1  3  4
2  5  3


In [4]:
#Text file (with spaces) to DataFrame:

df = pd.read_fwf('../datasets/primitivo.txt')
print(df)

           x y
0    10.0 8.04
1     8.0 6.95
2    13.0 7.58
3     9.0 8.81
4    11.0 8.33
5    14.0 9.96
6     6.0 7.24
7     4.0 4.26
8   12.0 10.84
9     7.0 4.82
10    5.0 5.68


In [20]:
#Excel file to DataFrame:

df= pd.read_excel('../datasets/person.xlsx')
print(df)

    Name  Age Gender
0   Siri   15      f
1  Laura    6      f
2  Oscar    5      m


In [6]:
#Csv file to DataFrame:

df = pd.read_csv('../datasets/GDP-2015.csv')
print(df)

          Entity Code  Year  GDP per capita
0    Afghanistan  AFG  2015            1928
1        Albania  ALB  2015           10947
2        Algeria  DZA  2015           13024
3         Angola  AGO  2015            8631
4      Argentina  ARG  2015           19316
..           ...  ...   ...             ...
162    Venezuela  VEN  2015           16257
163      Vietnam  VNM  2015            5733
164        Yemen  YEM  2015            2496
165       Zambia  ZMB  2015            3537
166     Zimbabwe  ZWE  2015            1759

[167 rows x 4 columns]


## Looking at DataFrames

In [7]:
df.head()#first 5 rows

Unnamed: 0,Entity,Code,Year,GDP per capita
0,Afghanistan,AFG,2015,1928
1,Albania,ALB,2015,10947
2,Algeria,DZA,2015,13024
3,Angola,AGO,2015,8631
4,Argentina,ARG,2015,19316


In [8]:
df.tail(3)  #last 3 rows

Unnamed: 0,Entity,Code,Year,GDP per capita
164,Yemen,YEM,2015,2496
165,Zambia,ZMB,2015,3537
166,Zimbabwe,ZWE,2015,1759


In [9]:
df.describe()

Unnamed: 0,Year,GDP per capita
count,167.0,167.0
mean,2015.0,18216.598802
std,0.0,19305.364946
min,2015.0,605.0
25%,2015.0,3705.0
50%,2015.0,11738.0
75%,2015.0,25843.0
max,2015.0,139542.0


## Working with DataFrames

Grab a column:

In [10]:
df.columns

Index(['Entity', 'Code', 'Year', 'GDP per capita'], dtype='object')

In [11]:
countries = df['Entity']
countries

0      Afghanistan
1          Albania
2          Algeria
3           Angola
4        Argentina
          ...     
162      Venezuela
163        Vietnam
164          Yemen
165         Zambia
166       Zimbabwe
Name: Entity, Length: 167, dtype: object

In [12]:
gdp = df['GDP per capita']
gdp

0       1928
1      10947
2      13024
3       8631
4      19316
       ...  
162    16257
163     5733
164     2496
165     3537
166     1759
Name: GDP per capita, Length: 167, dtype: int64

Grab an entry:

In [13]:
gdp = df['GDP per capita']
gdpAngola = df['GDP per capita'][3]
gdpAngola

8631

## Small example

In [16]:
personer= pd.read_excel('../datasets/person.xlsx')
personer['Name']

0     Siri
1    Laura
2    Oscar
Name: Name, dtype: object

Add a column:

In [21]:
personer['HasBike'] = True
personer.head()

Unnamed: 0,Name,Age,Gender,HasBike
0,Siri,15,f,True
1,Laura,6,f,True
2,Oscar,5,m,True


Save changes to a file:

In [22]:
personer.to_excel('../datasets/person_ny.xlsx')