# Python Tutorials: Pandas DataFrame

### Author: Dr. Owen Chen

### Date: 2023 Fall



<a id = "pandas"></a>
# Pandas DataFrame
[Go back to Table of Content](#toc)

*   One of the most highly leveraged data structures for data science
*   A table-like two dimensional data structure. 


### Create a DataFrame

In [None]:
import pandas as pd
first_names = ['henry', 'rolly', 'molly', 'frank', 'david', 'steven', 'gwen', 'arthur']
last_names = ['smith', 'brocker', 'stein', 'bach', 'spencer', 'de wilde', 'mason', 'davis']
ages = [43, 23, 78, 56, 26, 14, 46, 92]

df = pd.DataFrame({ 'first': first_names, 'last': last_names, 'age': ages})
df

Unnamed: 0,age,first,last
0,43,henry,smith
1,23,rolly,brocker
2,78,molly,stein
3,56,frank,bach
4,26,david,spencer
5,14,steven,de wilde
6,46,gwen,mason
7,92,arthur,davis


### Head - looking at the top

In [None]:
df.head(10)

Unnamed: 0,age,first,last
0,43,henry,smith
1,23,rolly,brocker
2,78,molly,stein
3,56,frank,bach
4,26,david,spencer
5,14,steven,de wilde
6,46,gwen,mason
7,92,arthur,davis


### Setting number of rows returned with head

In [None]:
df.head(3)

### Tail - looking at the bottom

In [None]:
df.tail(2)

Unnamed: 0,age,first,last
6,46,gwen,mason
7,92,arthur,davis


### Describe - descriptive statistics

In [None]:
df.describe()

Unnamed: 0,age
count,8.0
mean,47.25
std,27.227874
min,14.0
25%,25.25
50%,44.5
75%,61.5
max,92.0


### Access one column

In [None]:
df['first']

0     henry
1     rolly
2     molly
3     frank
4     david
5    steven
6      gwen
7    arthur
Name: first, dtype: object

### Slice a column

In [None]:
df['first'][4:]

4     david
5    steven
6      gwen
7    arthur
Name: first, dtype: object

### Use conditions to filter

In [None]:
df[df['age'] > 50]

Unnamed: 0,age,first,last
2,78,molly,stein
3,56,frank,bach
7,92,arthur,davis


<a id = "series"></a>
# 9. Pandas Series

[Go back to Table of Content](#toc)

*   A one dimensional labeled array
*   Contains data of only one type
*   Similar to a column in a spreedsheet




### Create a series

In [None]:
pd_series = pd.Series( [1, 2, 3 ] )
pd_series

0    1
1    2
2    3
dtype: int64

### Series introspection methods

In [None]:
f"This series is made up of {pd_series.size} items whose data type is {pd_series.dtype}"

'This series is made up of 3 items whose data type is int64'

### A Pandas DataFrame is composed of Pandas Series. 

In [None]:
age = df.age
type( age )

pandas.core.series.Series

### Some useful helper methods of a Series

#### mean

In [None]:
pd_series = pd.Series([ 1, 2, 3, 5, 6, 6, 6, 7, 8])
pd_series.mean()

4.888888888888889

#### Unique

In [None]:
pd_series.unique()

array([1, 2, 3, 5, 6, 7, 8])

#### Max

In [None]:
pd_series.min()

1

# References:
[Numpy arrays](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html)

[Pandas DataFrame](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.html)

[Pandas Series](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Series.html)

