# NumPy and Pandas

## Pandas

A Pandas series is a one-dimensional array-like object that can hold many data types, such as numbers or strings, and has an option to provide axis labels.

In [1]:
import pandas as pd

In [2]:
groceries = pd.Series(data = [10, 20, "Yes", "No"], index = ['eggs', 'apples', 'milk', 'bread'])

In [3]:
groceries

eggs       10
apples     20
milk      Yes
bread      No
dtype: object

Just like NumPy ndarrays, Pandas Series have attributes that allow us to get information from the series in an easy way

In [4]:
groceries.shape

(4,)

In [5]:
groceries.size

4

In [7]:
groceries.ndim

1

We can also use the attributes index and values to print the index and values in our data respectively

In [9]:
groceries

eggs       10
apples     20
milk      Yes
bread      No
dtype: object

In [10]:
groceries.values

array([10, 20, 'Yes', 'No'], dtype=object)

In [11]:
groceries.index

Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')

In [13]:
# We can check if something is in our index by using the membership operator
'bread' in groceries

True

In [14]:
'cassava' in groceries

False

Pandas Series are ordered and mutable
You can either use numeric indices or index labels. To remove ambiguities from rather we're referring to integers or index labels, Pandas has two attributes "loc & iloc"

In [15]:
groceries = pd.Series(data = [10, 20, "Yes", "No"], index = ['eggs', 'apples', 'milk', 'bread'])

In [16]:
groceries

eggs       10
apples     20
milk      Yes
bread      No
dtype: object

In [17]:
groceries["bread"]

'No'

In [18]:
groceries[3]

  groceries[3]


'No'

In [20]:
groceries[["bread", "milk"]]

bread     No
milk     Yes
dtype: object

In [21]:
groceries[[2,3]]

  groceries[[2,3]]


milk     Yes
bread     No
dtype: object

#### loc & iloc

In [23]:
groceries.iloc[[2,3]]

milk     Yes
bread     No
dtype: object

In [25]:
groceries.loc[["eggs", "bread"]]

eggs     10
bread    No
dtype: object

Pandas Series like NumPy arrays are mutable and ordered.

To delete elements from a Pandas Series, we use the drop method. drop()
* The drop method drops elements out of place. To drop elements in place, we set inplace=True

In [26]:
groceries.drop("apples")

eggs      10
milk     Yes
bread     No
dtype: object

In [33]:
groceries

eggs       10
apples     20
milk      Yes
bread      No
dtype: object

In [34]:
groceries.drop("eggs", inplace=True)

In [35]:
groceries

apples     20
milk      Yes
bread      No
dtype: object

Arithmetic operations on Pandas Series

Just like with NumPy arrays, Pandas Series also allows element-wise operations

In [37]:
groceries

apples     20
milk      Yes
bread      No
dtype: object

In [38]:
groceries["milk"] = 45

In [39]:
groceries["bread"] = 15

In [40]:
groceries

apples    20
milk      45
bread     15
dtype: object

In [41]:
groceries  + 500

apples    520
milk      545
bread     515
dtype: object

In [42]:
groceries / 39

apples    0.512821
milk      1.153846
bread     0.384615
dtype: object

We can also use NumPy

In [50]:
import numpy as np
fruits= pd.Series(data = [10, 6, 3,], index = ['apples', 'oranges', 'bananas'])

In [51]:
np.exp(fruits)

apples     22026.465795
oranges      403.428793
bananas       20.085537
dtype: float64

In [53]:
np.sqrt(fruits)

apples     3.162278
oranges    2.449490
bananas    1.732051
dtype: float64

In [54]:
np.power(fruits, 2)

apples     100
oranges     36
bananas      9
dtype: int64

We can also perform element-wise operations on single elements of a series

In [55]:
fruits.iloc[2] + 2

5

In [56]:
fruits.loc["oranges"] * 80

480

In [57]:
fruits.loc[["apples", "oranges"]] * 20

apples     200
oranges    120
dtype: int64

Pandas DataFrames

Pandas DataFrames are two-dimensional data structures with labeled rows and columns, that can hold many data types.

We can create a Pandas DataFrame manually or by loading data from a file. To create the DataFrame manually, we can use:

* a dictionary of Pandas Series as values
* a dictionary of lists as values
* a list of dictionaries

In [80]:
# from a dictionary
items = {'Bob' : pd.Series(data = [245, 25, 55], index = ['bike', 'pants', 'watch']),
         'Alice' : pd.Series(data = [40, 110, 500, 45], index = ['book', 'glasses', 'bike', 'pants'])}
# We may not provide index. 

In [69]:
shopping_carts = pd.DataFrame(items)

In [70]:
shopping_carts

Unnamed: 0,Bob,Alice
bike,245.0,500.0
book,,40.0
glasses,,110.0
pants,25.0,45.0
watch,55.0,


Pandas DataFrame also possesses arguments similar to NumPy arrays


We can decide to load only particular columns or index by setting index and/or columns

In [85]:
bob_shopping_cart = pd.DataFrame(items, columns=['Bob'])
bob_shopping_cart

Unnamed: 0,Bob
bike,245
pants,25
watch,55


In [87]:
sel_shopping_cart = pd.DataFrame(items, index = ['pants', 'book'])
sel_shopping_cart

Unnamed: 0,Bob,Alice
pants,25.0,45
book,,40


In [88]:
alice_sel_shopping_cart = pd.DataFrame(items, index = ['glasses', 'bike'], columns = ['Alice'])
alice_sel_shopping_cart

Unnamed: 0,Alice
glasses,110
bike,500


In [90]:
data = {'Integers' : [1,2,3],
        'Floats' : [4.5, 8.2, 9.6]}
df = pd.DataFrame(data)
df

Unnamed: 0,Integers,Floats
0,1,4.5
1,2,8.2
2,3,9.6


In [92]:
mf = pd.DataFrame(data, index = ['label 1', 'label 2', 'label 3'])
mf

Unnamed: 0,Integers,Floats
label 1,1,4.5
label 2,2,8.2
label 3,3,9.6


In [72]:
shopping_carts.shape

(5, 2)

In [73]:
shopping_carts.size

10

In [74]:
shopping_carts.ndim

2

In [75]:
shopping_carts.index

Index(['bike', 'book', 'glasses', 'pants', 'watch'], dtype='object')

In [76]:
shopping_carts.columns

Index(['Bob', 'Alice'], dtype='object')

In [77]:
shopping_carts.values

array([[245., 500.],
       [ nan,  40.],
       [ nan, 110.],
       [ 25.,  45.],
       [ 55.,  nan]])