# Basic Pandas Operations

## Why do we use Pandas?

Pandas is a powerful Python library that supports creation, storage, and manipulation of data frames. As a psychologist or data scientist, you will inevitably deal with some kind of datasets. Pandas stores and represents data in a streamlined fashion that makes things simple for you - either you are planning on directly analyzing the data in Python or exporting the data to CSV files and analyze them in other statistical softwares such as R.

It also has other advantages such as simplifying coding writing, handling large data more efficiently, and making data more flexible and customizable. More importantly, Pandas has features such as handling missing data, cleaning up the data, and it also supports multiple file formats including CSV, Excel, SQL, etc.

In this activity, we will learn some of the basic oprations to get you started with Pandas.

## Storing data frames in a CSV file

First we will create a dictionary containing the data. Then we can convert the dictionary to a pandas data frame and store it as a csv file.

In [2]:
import numpy as np
import pandas as pd

name_dict = {
    'Name':['Adam','Becky','Charlie','Daniel','Emily','Frank','Greta','Helen','Ian','Jack','Klaus','Lucy'],
    'Class ID':list(range(1,13)),
    'Age':[int(i) for i in (np.round(np.random.uniform(18,30,12),0))],
    'Score':[int(i) for i in (np.round(np.random.uniform(60,100,12),0))]
}

df = pd.DataFrame(name_dict)

df.to_csv('ClassList.csv',index=False)

## Loading in data from a CSV file

After loading in the data, we can do some quick checks using `head()` and `tail()`

In [3]:
df = pd.read_csv('ClassList.csv')
df.head() # This will give you the first 5 rows of data

Unnamed: 0,Name,Class ID,Age,Score
0,Adam,1,28,63
1,Becky,2,27,72
2,Charlie,3,28,82
3,Daniel,4,22,82
4,Emily,5,24,71


In [4]:
df.tail() # This will give you the last 5 rows of data

Unnamed: 0,Name,Class ID,Age,Score
7,Helen,8,27,97
8,Ian,9,23,63
9,Jack,10,26,98
10,Klaus,11,29,94
11,Lucy,12,27,99


## Accessing a particular row, column, or cell

In some cases, we want to access a particular row of the data.

In [5]:
df.loc[0,:] # Accessing the first row of data, the first value points to the row(s) and the second value points to the column(s)

Name        Adam
Class ID       1
Age           28
Score         63
Name: 0, dtype: object

We can select a particular column by its name

In [6]:
df.Score # Accessing the Score column

0     63
1     72
2     82
3     82
4     71
5     82
6     84
7     97
8     63
9     98
10    94
11    99
Name: Score, dtype: int64

Or you can select a column by its index

In [8]:
df.iloc[:,2] # This will also access the Score column

0     28
1     27
2     28
3     22
4     24
5     21
6     26
7     27
8     23
9     26
10    29
11    27
Name: Age, dtype: int64

It's also possible to select a particular cell

In [9]:
df.iloc[0]['Score'] # This will access the first row ([0]) of the Score column

63

## Iterate through rows and accessing a particular column of that row

In some cases, we need to iterate through all the rows and do some computation with a particular column of that row

In [10]:
# Printing everything of each row
for row in df.iterrows():
    print(row)

(0, Name        Adam
Class ID       1
Age           28
Score         63
Name: 0, dtype: object)
(1, Name        Becky
Class ID        2
Age            27
Score          72
Name: 1, dtype: object)
(2, Name        Charlie
Class ID          3
Age              28
Score            82
Name: 2, dtype: object)
(3, Name        Daniel
Class ID         4
Age             22
Score           82
Name: 3, dtype: object)
(4, Name        Emily
Class ID        5
Age            24
Score          71
Name: 4, dtype: object)
(5, Name        Frank
Class ID        6
Age            21
Score          82
Name: 5, dtype: object)
(6, Name        Greta
Class ID        7
Age            26
Score          84
Name: 6, dtype: object)
(7, Name        Helen
Class ID        8
Age            27
Score          97
Name: 7, dtype: object)
(8, Name        Ian
Class ID      9
Age          23
Score        63
Name: 8, dtype: object)
(9, Name        Jack
Class ID      10
Age           26
Score         98
Name: 9, dtype: object)
(10,

In [11]:
# Iterate through all rows and select only the Score column
for index,row in df.iterrows():
    print(row['Score'])

63
72
82
82
71
82
84
97
63
98
94
99
