Name: Soji Ademiluyi  
Email: aademilu@uncc.edu

Data frames are generally 2-dimensional and can store many types of data. It can store lists as well as dictionaries.

In [1]:
import pandas as pd

By convention: `import pandas as pd`   
Dictionary entries are formated in dataframes with keys as column names and entries as the rows of the given columns.

In [16]:
df = pd.DataFrame(
    {
        "Size": [
            "Small",
            "Large",
            "Medium",
        ],
        "Item": ["Sad Burger", "Depressed Burger", "Therapy Burger"],
        "Price": ["50$", "30$", "20$"],
    }
)


In [17]:
df

Unnamed: 0,Size,Item,Price
0,Small,Sad Burger,50$
1,Large,Depressed Burger,30$
2,Medium,Therapy Burger,20$


In [5]:
dr = pd.DataFrame([10,20,30])

In [6]:
dr 

Unnamed: 0,0
0,10
1,20
2,30


In [10]:
dr.shape

(3, 1)

In [18]:

dl = pd.DataFrame([[20,20,20], [20,20], [30,30]])

Additional columns can be added on axis 1. A new column will automatically fill with NaN with one addition to the column. 

In [19]:
dl

Unnamed: 0,0,1,2
0,20,20,20.0
1,20,20,
2,30,30,


In [20]:
dl[0]

0    20
1    20
2    30
Name: 0, dtype: int64

Each column is a series of entries that can be called on its own.

In [None]:
df["Size"]

0     Small
1     Large
2    Medium
Name: Size, dtype: object

In [33]:
age = pd.Series([200, 35, 58, 28, 22, 35, 58, 28], name="Age")

In [41]:
print(age.describe())

count      8.000000
mean      58.000000
std       58.940648
min       22.000000
25%       28.000000
50%       35.000000
75%       58.000000
max      200.000000
Name: Age, dtype: float64


### Importing and Exporting  
Importing and exporing can be done with a wide variety of file types.

`pd.read_csv()`  
`pd.read_json()`  
`pd.read_excel()`    

After loading this data in, various operations can be run on it. We can use `.tail(20)` to return the last few rows. or `.dtypes` to return the data types for each column.   
To get a wider range of information on your dataframe you can use `.info()`  
We can send the data we've modified to a file by using a method like the `.to_excel()` method.  
For example if you want to send your dataframe to excel:  
`data.to_excel("data.xlsx", sheet_name="data" index=False)`

Selecting a particular column returns a one dimensional series object.

In [47]:
price = df["Price"]

In [48]:
price

0    50$
1    30$
2    20$
Name: Price, dtype: object

In [49]:
type(price)

pandas.core.series.Series

In [50]:
dr = pd.DataFrame(
    {
        "Size": [
            "Small",
            "Large",
            "Medium",
        ],
        "Item": ["Sad Burger", "Depressed Burger", "Therapy Burger"],
        "Price": ["50$", "30$", "20$"],
        "Amount": [300, 500, 200],
    }
)


In [51]:
dr

Unnamed: 0,Size,Item,Price,Amount
0,Small,Sad Burger,50$,300
1,Large,Depressed Burger,30$,500
2,Medium,Therapy Burger,20$,200


In [56]:
type(dr[["Amount"]])

pandas.core.frame.DataFrame

In [55]:
type(dr["Amount"])

pandas.core.series.Series

A list will return a particular Series object while a nested list will be a dataframe.

In [61]:
print(dr["Amount"].shape,dr[["Amount"]].shape)

(3,) (3, 1)


In [77]:
newDr = dr[dr["Amount"] > 250]
oldDr = dr[dr["Price"].isin(["20$", "30$"])]
oldestDR = dr[(dr["Price"] == "20$") | (dr["Price"] == "50$")]

We can filter using a query on the specific column.

In [73]:
newDr

Unnamed: 0,Size,Item,Price,Amount
0,Small,Sad Burger,50$,300
1,Large,Depressed Burger,30$,500


In [72]:
oldDr

Unnamed: 0,Size,Item,Price,Amount
1,Large,Depressed Burger,30$,500
2,Medium,Therapy Burger,20$,200


In [78]:
oldestDR

Unnamed: 0,Size,Item,Price,Amount
0,Small,Sad Burger,50$,300
2,Medium,Therapy Burger,20$,200


`.notna()` returns values that aren't unknown

We can use `loc` and `iloc` to specify specific row and column combinations.  
`loc` can be used for name selectors, for example:

In [81]:
newerDr = dr.loc[dr["Size"] == "Large", "Amount"]

In [83]:
newerDr

1    500
Name: Amount, dtype: int64

The syntax is rows first, then the column for that row.

We used `iloc` to achieve this thorugh indexing.

In [84]:
newestDr = dr.iloc[1,0:]

In [85]:
newestDr

Size                 Large
Item      Depressed Burger
Price                  30$
Amount                 500
Name: 1, dtype: object

This returned the second row for every column after column #0

In [93]:
dr.iloc[0:3, 3] = 300

we can even change these values if we are selecting specific rows or columns.

In [94]:
dr

Unnamed: 0,Size,Item,Price,Amount
0,Small,Sad Burger,50$,300
1,Large,Depressed Burger,30$,300
2,Medium,Therapy Burger,20$,300
