# Creating DataFrames #

Panda Series contains tabular data of rows and columns

DataFrames recognizes when:
- a dictionary contains pandas Series as values. Uses the key as column label
- a list contains dictionaries


1) Create a dictionary of panda Series - items = {k:v, k:v, k:v} where v = pd.Series()

OR create a list of dictionaries - items = [{k:v}, {k:v}, {k:v}]

2) then put it into a DataFrame object and it returns a well formatted var - df = pd.DataFrames(items)

3) the key is always the column label

Note, you can also specify pd.DataFrame[X, index=Y, column=Z]



In [75]:
import pandas as pd

def pp(z):
    print('--------------------')
    print('\n', z)

# just a regular dictionary
items = {'Bob':pd.Series([10, 20, 50], index=['bike', 'pants', 'watch']), 'Mary':pd.Series([400000, 15000, 3, 4], index=['house', 'car', 'pen', 'pants'])}

pp(items)

pp(type(items))



--------------------

 {'Bob': bike     10
pants    20
watch    50
dtype: int64, 'Mary': house    400000
car       15000
pen           3
pants         4
dtype: int64}
--------------------

 <class 'dict'>


## Think of DataFrames like a spreadsheet ##

1) create a dictionary of Series
2) pass it into DataFrame object

displays in tabular form, with rows and columns

rows are the union of the indexes from the panda Series in the dictionary (if no indexes, it will use numerical indexes)
columns are the keys of the dictionary (arranged alphabetically, unless loaded from a file)



In [73]:


shopping_carts = pd.DataFrame(items)
shopping_carts

Unnamed: 0,Bob,Mary
bike,10.0,
car,,15000.0
house,,400000.0
pants,20.0,4.0
pen,,3.0
watch,50.0,


In [53]:
no_index = {'Bob':pd.Series([1, 2, 3, 4, 5, 6, 7]), 'Mary':pd.Series([10, 20, 30])}

pd.DataFrame(no_index)
pp(no_index)

df = pd.DataFrame(no_index)
pp(df)
df

--------------------

 {'Bob': 0    1
1    2
2    3
3    4
4    5
5    6
6    7
dtype: int64, 'Mary': 0    10
1    20
2    30
dtype: int64}
--------------------

    Bob  Mary
0    1  10.0
1    2  20.0
2    3  30.0
3    4   NaN
4    5   NaN
5    6   NaN
6    7   NaN


Unnamed: 0,Bob,Mary
0,1,10.0
1,2,20.0
2,3,30.0
3,4,
4,5,
5,6,
6,7,


DataFrames will pretty print without the print statement

It will just join the two sets of values without indexes justifying from the beginning for each list of values

In [54]:
shopping_carts.columns



Index(['Bob', 'Mary'], dtype='object')

In [55]:
shopping_carts.index


Index(['bike', 'car', 'house', 'pants', 'pen', 'watch'], dtype='object')

In [56]:
shopping_carts.values

array([[1.0e+01,     nan],
       [    nan, 1.5e+04],
       [    nan, 4.0e+05],
       [2.0e+01, 4.0e+00],
       [    nan, 3.0e+00],
       [5.0e+01,     nan]])

In [57]:
shopping_carts.shape

(6, 2)

In [58]:
shopping_carts.ndim

2

In [59]:
shopping_carts.size

12

In [60]:
bobs_shopping = pd.DataFrame(items, columns=['Bob'])
bobs_shopping

Unnamed: 0,Bob
bike,10
pants,20
watch,50


In [61]:
big_ticket = pd.DataFrame(items, index=['car', 'house'])
big_ticket

Unnamed: 0,Bob,Mary
car,,15000
house,,400000


In [62]:
alice_big_ticket = pd.DataFrame(items, index=['car', 'house'], columns=['Mary'])
alice_big_ticket

Unnamed: 0,Mary
car,15000
house,400000


Manually create DataFrames from a dictionary of lists

1) create a dictionary of arrays
2) pass it into DataFrames

Needs to be of the same length, since it's not a panda Series

In [63]:
data = {'Integers': [3, 4, 5, 6], 'Floats':[1.0, 2.0, 3.0, 4.0]}

data_df = pd.DataFrame(data)
data_df
                    

Unnamed: 0,Integers,Floats
0,3,1.0
1,4,2.0
2,5,3.0
3,6,4.0


Because it's not a panda Series, it doesn't have indexes

Can add them

In [66]:
data_df = pd.DataFrame(data, index=['first', 'second', 'third', 'fourth'])
data_df

Unnamed: 0,Integers,Floats
first,3,1.0
second,4,2.0
third,5,3.0
fourth,6,4.0


In [68]:
list_of_dics = [{'pants':100, 'shirt':15}, {'tv':800, 'laptop':1000}]
df = pd.DataFrame(list_of_dics)
df

Unnamed: 0,pants,shirt,tv,laptop
0,100.0,15.0,,
1,,,800.0,1000.0


In [70]:
df = pd.DataFrame(list_of_dics, index=['clothing', 'electronics'])
df

Unnamed: 0,pants,shirt,tv,laptop
clothing,100.0,15.0,,
electronics,,,800.0,1000.0
