# The Pandas library

## What is Pandas?

https://pandas.pydata.org/

- An open source data analysis tool
- Built-in functions to read and write tabular data into most file types (.xlsx, .csv, .hdf, etc)
- Works well with matplotlib and other Python libraries

## Pros for the new user
- Well documented, both officially (pandas.pydata.org) and unofficially (zillions of blogs)
- Large, active user community on YouTube and stackexchange

## Cons for the new user
- The syntax can be very verbose
- People not familiar with Pandas will have trouble with your code if you work in a collaborative environment



In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Pandas dataframe

Dataframes are the primary data object in the Pandas library.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

- Two dimensional tabular data
- Size mutable (2d shape can change, ie the user can add or remove rows and columns)
- Does not have to be homogeneous types (but good practice)

In [6]:
df = pd.DataFrame([1,2,3])
df

Unnamed: 0,0
0,1
1,2
2,3


# Reading and writing data

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html


In [8]:
df = pd.read_excel("example.xlsx")
df.head()

Unnamed: 0,x,y
0,0,0
1,1,3
2,2,6
3,3,9
4,4,12


Let's add some new data to this dataframe. We can add rows pretty easily to the dataframe:

In [12]:
def fun(x):
    return x*x + 3*x + 10
df["z"] = fun(df.x)
df.head()

Unnamed: 0,x,y,z
0,0,0,10
1,1,3,14
2,2,6,20
3,3,9,28
4,4,12,38


Access data in several ways:

In [13]:
df.z

0      10
1      14
2      20
3      28
4      38
5      50
6      64
7      80
8      98
9     118
10    140
11    164
12    190
13    218
14    248
15    280
16    314
17    350
18    388
19    428
20    470
21    514
22    560
23    608
24    658
25    710
Name: z, dtype: int64

In [14]:
df["z"]

0      10
1      14
2      20
3      28
4      38
5      50
6      64
7      80
8      98
9     118
10    140
11    164
12    190
13    218
14    248
15    280
16    314
17    350
18    388
19    428
20    470
21    514
22    560
23    608
24    658
25    710
Name: z, dtype: int64

In [15]:
df.loc[:,"z"]

0      10
1      14
2      20
3      28
4      38
5      50
6      64
7      80
8      98
9     118
10    140
11    164
12    190
13    218
14    248
15    280
16    314
17    350
18    388
19    428
20    470
21    514
22    560
23    608
24    658
25    710
Name: z, dtype: int64

Perform calculations on the data pretty much as you would with any object:

In [17]:
df.z*3

0       30
1       42
2       60
3       84
4      114
5      150
6      192
7      240
8      294
9      354
10     420
11     492
12     570
13     654
14     744
15     840
16     942
17    1050
18    1164
19    1284
20    1410
21    1542
22    1680
23    1824
24    1974
25    2130
Name: z, dtype: int64

Save this data:

In [16]:
df.to_csv("updated_example.csv")