# Data Frames

A DataFrame is Jupyter's notion of a table.  Data frames are made of "Series" with each series representing one column of the table.  To get access to this we need to import the `pandas` module.  When you import a module in Python, you need to give it your own short name to use in the rest of the code, we use `pd` (as is common to most code).  We start with a series, because there is no dataframe without series.

# Series

A series is a list of values attached to an index.  The index appears on the left, the values on the right.  Python will tell you the type of data you are storing in your series.  If it needs to, pandas will make up an index for you, it normally starts from 0 and goes up by one each "row".


In [1]:
import pandas as pd
 
my_series = pd.Series([5,6,7])
#           this creates us a pandas Series which contains the numbers 5, 6, and 7
print(my_series)

0    5
1    6
2    7
dtype: int64


I can choose a different index with this notation here.  What you are seeing is a "dictionary" and something we will get into more later.

In [3]:
adjusted_series = pd.Series({100:5,200:6,300:7})
print(adjusted_series)

100    5
200    6
300    7
dtype: int64


Series _don't have colum names_.  If you are familiar with other programming languages they are like arrays, associative arrays, or dictionaries.

# Arithmetic on Series.

The `numpy` library will let us do arithmetic on whole series as if they were single values.  When you add, multiply, subtract, or divide a series, you get another series back with all the values adjusted.

In [5]:
import numpy as np

print(adjusted_series) 

print("-- adding one --")
print(adjusted_series + 1)

print("-- multiply by two --")
print(adjusted_series * 2)

print("-- boolean operators work too --")
print(adjusted_series > 6)

100    5
200    6
300    7
dtype: int64
-- adding one --
100    6
200    7
300    8
dtype: int64
-- multiply by two --
100    10
200    12
300    14
dtype: int64
-- boolean operators work too --
100    False
200    False
300     True
dtype: bool




# Data Frame

We want titles on our columns though, and we want multiple columns.  Data frames give us that.  We can promote a Series to data frame

In [6]:
first_frame = adjusted_series.to_frame()
first_frame

Unnamed: 0,0
100,5
200,6
300,7


It has given our one column a name.  I don't love that name, I would prefer "A"

In [7]:
second_frame = adjusted_series.to_frame("A")
second_frame

Unnamed: 0,A
100,5
200,6
300,7


You can get a series back from a dataframe using the _square bracket_ notation.

Note a trick that occurs here.  Series can have "names" which is a bit like the column header for the single column of data.  When pulling a series from a frame you will get that `name` populated with the column name.  You can see it at the bottom of the output.

In [8]:
second_frame["A"]
second_frame["A"][100]


5

Add you can add a column using the same notation (this is the same thing as `insert` but it is a nicer form)

In [9]:
second_frame["B"] = pd.Series([50,50,70])
second_frame


Unnamed: 0,A,B
100,5,
200,6,
300,7,


In [10]:
second_frame["C"] = [500,600,700]
second_frame

Unnamed: 0,A,B,C
100,5,,500
200,6,,600
300,7,,700


# Exercise - add column

Adjust the following code block so that it adds a new column (called "B") to the one-column data frame `table` Column "B" should have values one more (`+1`) than the corresponding existing value. I.e. replicate this spreadsheet table

![simple table](imgs/small_table.png)

Everything you need was covered in this notebook, but you might have to get creative in how you combine the ideas.


In [25]:
i =1 
table = pd.Series({1:5,2:6,3:7}).to_frame("A")
table = pd.Series({1:5+i,2:6+i,3:7+i}).to_frame("B")


# put your code here such that the next print of table has all the new values in column "B"
table["C"]=table["B"]+1

display(table)


Unnamed: 0,B,C
1,6,7
2,7,8
3,8,9
