# Pandas

Let's practice with Pandas dataframes and series!

In [2]:
#Run these imports first
import numpy as np
import pandas as pd

print ("numpy version:", np.__version__)
print ("pandas version:", pd.__version__)

numpy version: 1.23.5
pandas version: 1.5.3


## Step-1: Series

Let's make a series, including one field with NaN (an empty value).


In [3]:
s = pd.Series([1,3,5,np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

In [4]:
# Accessing a member of the series:
# remember zero based index

s[4]

6.0

## Step-2: Dataframes

Here we define dataframe.

![](../assets/images/pandas-series-3.png)

In [5]:
df = pd.DataFrame({'Month' : ['Jan', 'Feb', 'Mar', 'Apr'],
                    'Sales': [10, 20, 30, 40]})
df

Unnamed: 0,Month,Sales
0,Jan,10
1,Feb,20
2,Mar,30
3,Apr,40


### 2.1 - `df.info`

In [6]:
# df.info will give you info 

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Month   4 non-null      object
 1   Sales   4 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 192.0+ bytes


### 2.2 - `df.dtypes`

Find the type of each column

In [7]:
df.dtypes

Month    object
Sales     int64
dtype: object

### 2.3 - Update the df

In [8]:
# set sales for March = 300
## TODO : what is the row and column for 'Mar'
df.iloc[2,1] = 300
df

Unnamed: 0,Month,Sales
0,Jan,10
1,Feb,20
2,Mar,300
3,Apr,40


In [9]:
## TODO: Try setting some other Sales numbers
## Remember, the indexes for both row and column start at 0
## Try iloc [0,0] - which cell it modifies?

# your code here

## Step-3: Load data from CSV file

We will use `pd.read_csv` function.  **This is the most common way we would create dataframes**

**TODO: Inspect the data file using the file explorer**

**TODO: Try to read local data and remote data, by adjusting `data_location` variable**

### Read a local file

In [10]:
rain = pd.read_csv('../data/rainfall.csv')
rain

Unnamed: 0,City,Month,Rainfall
0,San Francisco,Jan,10.0
1,Seattle,Jan,30.0
2,Los Angeles,Jan,2.0
3,Seattle,Feb,20.0
4,San Francisco,Feb,4.0
5,Los Angeles,Feb,0.0
6,Seattle,Mar,22.0
7,San Francisco,Mar,4.0
8,Los Angeles,Mar,
9,Seattle,Apr,


### Read a remote file

In [11]:
rain = pd.read_csv('https://raw.githubusercontent.com/elephantscale/python-data-analytics/main/data/rainfall.csv')
rain

Unnamed: 0,City,Month,Rainfall
0,San Francisco,Jan,10.0
1,Seattle,Jan,30.0
2,Los Angeles,Jan,2.0
3,Seattle,Feb,20.0
4,San Francisco,Feb,4.0
5,Los Angeles,Feb,0.0
6,Seattle,Mar,22.0
7,San Francisco,Mar,4.0
8,Los Angeles,Mar,
9,Seattle,Apr,
