##Welcome to Pandas! - Introductory lesson to Pandas


First, we need to import the package. In Python, you need to import the packages or the functions before using them. When using a function from that package, you need to call the package. In order to have an abreviated version you can import the package as another shorter name. 

In [2]:
import pandas

In [3]:
import pandas as pd

First, lets create a list 

In [4]:
d = [0,1,2,3,4,5]

In [5]:
print(d)

[0, 1, 2, 3, 4, 5]


Now, we can create a data frame using the function from Pandas. Since we imported pandas as pd, we can call the data frame function using pd.DataFrame

In [7]:
myDf = pd.DataFrame(d)

In [8]:
print(myDf)

   0
0  0
1  1
2  2
3  3
4  4
5  5


We can rename the columns using the name of your dataframe followed by the function columns

In [9]:
myDf.columns = ['Col1']

In [10]:
myDf

Unnamed: 0,Col1
0,0
1,1
2,2
3,3
4,4
5,5


You can insert a new column into the data frame by indexing into a new name

In [11]:
myDf['NewCol'] = 5

In [12]:
myDf

Unnamed: 0,Col1,NewCol
0,0,5
1,1,5
2,2,5
3,3,5
4,4,5
5,5,5


You can modify your column by selecting it from the data frame and replacing it 

In [13]:
myDf['NewCol'] = myDf['NewCol'] + 1

In [14]:
myDf

Unnamed: 0,Col1,NewCol
0,0,6
1,1,6
2,2,6
3,3,6
4,4,6
5,5,6


You can delete columns from the data frame using the del function 

In [15]:
del myDf['NewCol']

In [16]:
myDf

Unnamed: 0,Col1
0,0
1,1
2,2
3,3
4,4
5,5


In [18]:
myDf['test'] = [5,4,3,2,1,0]

In [19]:
myDf

Unnamed: 0,Col1,test
0,0,5
1,1,4
2,2,3
3,3,2
4,4,1
5,5,0


We can create new columns using existing columns 

In [20]:
myDf['ExtraCol'] = myDf['Col1'] + myDf['test']

In [21]:
myDf

Unnamed: 0,Col1,test,ExtraCol
0,0,5,5
1,1,4,5
2,2,3,5
3,3,2,5
4,4,1,5
5,5,0,5


The index of a data frame provides the names per row. It can be used similarly to joints in SQL databases

In [22]:
myDf.index

Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')

In [23]:
i = ['a', 'b','c', 'd','e','f']

In [24]:
myDf.index = i

In [25]:
myDf

Unnamed: 0,Col1,test,ExtraCol
a,0,5,5
b,1,4,5
c,2,3,5
d,3,2,5
e,4,1,5
f,5,0,5


We can select different rows using the location function

In [26]:
myDf.loc['a']

Col1        0
test        5
ExtraCol    5
Name: a, dtype: int64

In [28]:
myDf.loc['a':'d']

Unnamed: 0,Col1,test,ExtraCol
a,0,5,5
b,1,4,5
c,2,3,5
d,3,2,5



See how the double brackets are used in this case because we need to first create a list with elements `'a'` and `'d'`

In [30]:
myDf.loc[['a','d']]

Unnamed: 0,Col1,test,ExtraCol
a,0,5,5
d,3,2,5


To use row numbers you need to use the `.iloc` function instead 

In [31]:
myDf.iloc[0:3]

Unnamed: 0,Col1,test,ExtraCol
a,0,5,5
b,1,4,5
c,2,3,5


We can also select columns by subsetting with the name of the column, like in any list

In [32]:
myDf['Col1']

a    0
b    1
c    2
d    3
e    4
f    5
Name: Col1, dtype: int64

In [33]:
myDf[['Col1','test']]

Unnamed: 0,Col1,test
a,0,5
b,1,4
c,2,3
d,3,2
e,4,1
f,5,0


And even subset rows of already selected columns

In [34]:
myDf[['Col1', 'test']][1:2]

Unnamed: 0,Col1,test
b,1,4


In [35]:
myDf[['Col1', 'test']][1:3]

Unnamed: 0,Col1,test
b,1,4
c,2,3


In [36]:
myDf[['Col1','test']][3:]

Unnamed: 0,Col1,test
d,3,2
e,4,1
f,5,0


In [37]:
myDf[['Col1','test']][:3]

Unnamed: 0,Col1,test
a,0,5
b,1,4
c,2,3


A way we can visualize parts of a data frame is with the functions `head()` and `tail()` that shows the first and last 5 rows

In [39]:
myDf.head()

Unnamed: 0,Col1,test,ExtraCol
a,0,5,5
b,1,4,5
c,2,3,5
d,3,2,5
e,4,1,5


Use the `?` to double check what a function does and how it works. What does `zip` do?

In [40]:
zip?

You can also import functions of a package without importing the whole package. In this case we need the random function from numpy

In [41]:
from numpy import random

We will set the seed of the random number generator in order to obtain the same numbers

In [42]:
random.seed(500)

Let's create a list with five names

In [43]:
names = ['Bob', 'Jessica', 'Mary', 'John', 'Ana']

Now lets make a list that randomly takes any of the names in our name list. This new list contains 1000 elements. The following two cells do the same thing.

In [50]:
random_names = [names[random.randint(low = 0, high = len(names))] for i in range(1000)]

In [79]:
random_names2 = []
for i in range(1000):
    random_names2.append(names[random.randint(0, 5)])  

In [45]:
len(names)

5

In [49]:
random.randint?

In [51]:
random_names[:10]

['Bob',
 'John',
 'Mary',
 'Ana',
 'John',
 'John',
 'Jessica',
 'Bob',
 'John',
 'Jessica']

Now lets create another list with some random numbers from 0 to 1000

In [58]:
births = [random.randint(low = 0, high = 1000) for i in range(1000)]

Now lets zip the baby names and the number of births. If you are using Python 3, make sure you list it after you zip it. 

In [59]:
BabySet = zip(random_names, births)

In [60]:
BabySet

<zip at 0x10702d088>

In [61]:
BabySet = list(zip(random_names, births))

In [64]:
BabySet[:10]

[('Bob', 697),
 ('John', 770),
 ('Mary', 537),
 ('Ana', 714),
 ('John', 43),
 ('John', 288),
 ('Jessica', 909),
 ('Bob', 16),
 ('John', 289),
 ('Jessica', 585)]

Now that we have this list, we can transform it into a data frame 

In [67]:
Babydf = pd.DataFrame(data = BabySet, columns = ['Names', 'Births'])

In [68]:
Babydf.head()

Unnamed: 0,Names,Births
0,Bob,697
1,John,770
2,Mary,537
3,Ana,714
4,John,43


Now lets save this data frame into a csv

In [69]:
Babydf.to_csv?

In [70]:
Babydf.to_csv('births1880.csv', index = False, header = False)

Now let's import this csv. We are going to save in location the path for the file. 
Notice the r before the string. Prefixing with r makes escapes the whole thing. 

In [71]:
Location = r'/Users/.../.../births1880.csv'

You can import the csv using `pd.read_csv`

In [74]:
otherdf = pd.read_csv(Location, header =  None)

In [75]:
otherdf.head()

Unnamed: 0,0,1
0,Bob,697
1,John,770
2,Mary,537
3,Ana,714
4,John,43


In [76]:
df = otherdf.head()

In [77]:
df

Unnamed: 0,0,1
0,Bob,697
1,John,770
2,Mary,537
3,Ana,714
4,John,43


These lessons are developed for the RStudyGroup in Vancouver, BC. Based on lessons from iPython.org
