# pandas Tutorial

This is a short tutorial developed for ChBE students, and especially for CHBE444. Please email comments and suggestions to Prof. Ganesh Sriram (gsriram@umd.edu).

## Introduction to Pandas

+ Pandas is a powerful package to handle data in tables.
+ People use Pandas when they work with tabular data, spreadsheets, Excel files or Google sheets.
+ As a simple example, let us make a table of the required CHBE courses in our program.

In [1]:
import pandas as pd

df = pd.DataFrame({
    'first_year_fall': ['', '', ''],
    'first_year_spring': ['CHBE101', '', ''],
    'sophomore_fall': ['CHBE250', 'CHBE301', ''],
    'sophomore_spring': ['CHBE302', '', ''],
    'junior_fal': ['CHBE410', 'CHBE422', 'CHBE440'],
    'junior_spring': ['CHBE424', 'CHBE426', ''],
    'senior_fall': ['CHBE437', 'CHBE442', 'CHBE444'],
    'senior_spring': ['CHBE446', '', '']
})  # define the dataframe

df  # displays the dataframe df; also, display(df)

Unnamed: 0,first_year_fall,first_year_spring,sophomore_fall,sophomore_spring,junior_fal,junior_spring,senior_fall,senior_spring
0,,CHBE101,CHBE250,CHBE302,CHBE410,CHBE424,CHBE437,CHBE446
1,,,CHBE301,,CHBE422,CHBE426,CHBE442,
2,,,,,CHBE440,,CHBE444,


+ Note how we padded the columns inside the dataframe. The padding was to ensure they all have the same length. You can also write a bit of code to do the padding.
+ We could have also read the dataframe from an Excel file. See here for syntax: https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html.
+ We can run lots of fancy display methods to make the table look pretty.
+ If the table had numbers, e.g., stream data from a chemical process, we could have run tabular calculations, e.g., material balances or heat exchanger cost calculations.

In [2]:
import pandas as pd

# Let us process a dataframe with numbers

df = pd.DataFrame({
    'x': [2, 7, 4],
    'y': [3, 1, -2]
})  # define the dataframe

display(df)

# Create a function operating on two columns

df['f'] = df.apply(lambda row: row['x']**2 + row['y']**2, axis=1)
df['g'] = df['x']**2 + df['y']**2

display(df)

Unnamed: 0,x,y
0,2,3
1,7,1
2,4,-2


Unnamed: 0,x,y,f,g
0,2,3,13,13
1,7,1,50,50
2,4,-2,20,20
