<div style="max-width:66ch;">

# Lecture notes - Pandas Series and Dataframe

This is the lecture note for **Pandas Series** and **Dataframe** - but it's built upon contents from previous course: 
- Python programming

<p class = "alert alert-info" role="alert"><b>Note</b> that this lecture note gives a brief introduction to Pandas. I encourage you to read further about pandas.

</div>


<div style="max-width:66ch;">

## Pandas Series

1D array with flexible indices. Series can be seened as a "typed dictionary". The typing makes it more efficient than dictionary in certain computations
- create from dictionary 
- create from list 
- create from array 

</div>

In [7]:
import pandas as pd 
data = dict(INKÖP = 25, OPA = 30 , JS = 30, JAVA = 27) # number of students

series_programs = pd.Series(data=data)
print(series_programs)

# extract values 
print(f"series_programs[0] -> {series_programs.iloc[0]}")
print(f"series_programs[-1] -> {series_programs.iloc[-1]}")

# get the keys
print(f"series_programs.keys() -> {series_programs.keys()}") 
print(f"series_programs.keys()[2] -> {series_programs.keys()[2]}") 

INKÖP    25
OPA      30
JS       30
JAVA     27
dtype: int64
series_programs[0] -> 25
series_programs[-1] -> 27
series_programs.keys() -> Index(['INKÖP', 'OPA', 'JS', 'JAVA'], dtype='object')
series_programs.keys()[2] -> JS


In [8]:
import random as rnd
rnd.seed(42)

# create Series using list
dice_series = pd.Series([rnd.randint(1,6) for _ in range(5)])
print(dice_series)

# some useful methods
print(f"Min value {dice_series.min()}")
print(f"Mean value {dice_series.mean()}")
print(f"Median value {dice_series.median()}")

0    6
1    1
2    1
3    6
4    3
dtype: int64
Min value 1
Mean value 3.4
Median value 3.0


<div style="max-width:66ch;">

## DataFrame
Analog of 2D Numpy array with flexible row indices and col names. Can also be seened as specialized dictionary where each col name is mapped to a Series object. 

- notice that for all operations on DataFrames, we get a return value, which means that you have to assign it to a variable for the changes to persist, unless you specify inplace = True for those methods that provides this parameter.

</div>

In [9]:
df_programs = pd.DataFrame(series_programs,columns=("Num students",))
df_programs

Unnamed: 0,Num students
INKÖP,25
OPA,30
JS,30
JAVA,27


In [10]:
# create 2 Series objects using dictionary
students = pd.Series(dict(AI = 25, NET = 30 , APP = 30, Java = 27))
language = pd.Series(dict(AI="Python", NET="C#", APP="Kotlin", Java = "Java"))

# create a DataFrame from 2 Series objects using dictionary
df_programs = pd.DataFrame({"Students":students, "Language":language}) # key becomes col name
df_programs

Unnamed: 0,Students,Language
AI,25,Python
NET,30,C#
APP,30,Kotlin
Java,27,Java


In [11]:
import numpy as np
# can also be created directly
df_programs = pd.DataFrame({
    "Students": np.array((25, 30, 30, 27)),
    "Language": np.array(("Python", "C#", "Kotlin", "Java"))},
    index = ["AI", ".NET", "APP", "Java"])
df_programs

Unnamed: 0,Students,Language
AI,25,Python
.NET,30,C#
APP,30,Kotlin
Java,27,Java


In [12]:
 # dtype object is used for text or mixed numeric or non-numeric values
df_programs.index

Index(['AI', '.NET', 'APP', 'Java'], dtype='object')

<div style="max-width:66ch;">

## Data selection

  
Can select 
- column(s) with bracket notation (dictionary-style indexing)
- column(s) with attribute-style indexing
    - can give unexpected errors as some methods can share same name as col name   
- row(s) with iloc (integer-based indexing)
- row(s) with loc (label-based indexing)
- boolean indexing
- and some more selecting options, we'll cover those in throughout the course

</div>

In [13]:
# gives a Series object of Students 
df_programs["Students"] # dictionary indexing

AI      25
.NET    30
APP     30
Java    27
Name: Students, dtype: int64

In [14]:
# select multiple columns using list 
df_programs[["Language", "Students"]]

Unnamed: 0,Language,Students
AI,Python,25
.NET,C#,30
APP,Kotlin,30
Java,Java,27


In [15]:
df_programs.Language # attribute indexing

AI      Python
.NET        C#
APP     Kotlin
Java      Java
Name: Language, dtype: object

In [16]:
df_programs["Language"][".NET"] # selects the Language Series and indexes .NET

'C#'

<div style="max-width:66ch;">

## Indexers

Gives a slicing interface for the indices. loc and iloc are attributes of Series and DataFrame objects.

<table style="display:inline-block; text-align:left;">
  <tr style="background-color: #174A7E; color: white;">
      <th style="text-align:center">Indexer</th>
      <th style="text-align:left">Description</th>
    <tr>
      <td style="text-align:center">loc</td>
      <td style="text-align:left">slicing and indexing referencing explicit index</td>
    </tr>
    <tr>
      <td style="text-align:center">iloc</td>
      <td style="text-align:left">slicing and indexing referencing Python-style index</td>
    </tr>
</table>

</div>


In [17]:
print(df_programs.loc["Java"])

# index multiple rows
df_programs.loc[["Java", "APP"]]

Students      27
Language    Java
Name: Java, dtype: object


Unnamed: 0,Students,Language
Java,27,Java
APP,30,Kotlin


In [18]:
# slicing with array-style indices
df_programs.iloc[1:3]

Unnamed: 0,Students,Language
.NET,30,C#
APP,30,Kotlin


<div style="max-width:66ch;">

## Masking
Replaces values where the condition is True

```py
df = df[conditions]
``````

</div>

In [19]:
print(df_programs["Students"] > 25) # this gives a pandas Series of type bool 

df_over_25 = df_programs[df_programs["Students"]>25]
df_over_25

AI      False
.NET     True
APP      True
Java     True
Name: Students, dtype: bool


Unnamed: 0,Students,Language
.NET,30,C#
APP,30,Kotlin
Java,27,Java


<div style="max-width:66ch;">

## Summary

In this lecture we've covered the very basics of Pandas as a dataprocessing library, where we've gone through Series och DataFrame objects.

</div>

<div style="background-color: #FFF; color: #212121; border-radius: 1px; width:22ch; box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px; display: flex; justify-content: center; align-items: center;">
<div style="padding: 1.5em 0; width: 70%;">
    <h2 style="font-size: 1.2rem;">Kokchun Giang</h2>
    <a href="https://www.linkedin.com/in/kokchungiang/" target="_blank" style="display: flex; align-items: center; gap: .4em; color:#0A66C2;">
        <img src="https://content.linkedin.com/content/dam/me/business/en-us/amp/brand-site/v2/bg/LI-Bug.svg.original.svg" width="20"> 
        LinkedIn profile
    </a>
    <a href="https://github.com/kokchun/Portfolio-Kokchun-Giang" target="_blank" style="display: flex; align-items: center; gap: .4em; margin: 1em 0; color:#0A66C2;">
        <img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="20"> 
        Github portfolio
    </a>
    <span>AIgineer AB</span>
<div>
</div>
