# Operations

There are lots of operations with pandas that will be really useful to you, but don't fall into any distinct category. Let's show them here in this lecture:

In [56]:
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})
df.head()

Unnamed: 0,col1,col2,col3
0,1,444,abc
1,2,555,def
2,3,666,ghi
3,4,444,xyz


### Info on Unique Values

There's three main useful methods concerned with finding unique values in a data frame.

Imagine you wanted to find all the unique values in column two.

The method we can use for that is just the unique method and this will return a higher rate of all the unique values in column two instead of actually wanting the array of unique values.

Let's say you just actually want the number of values itself.

Well there's two methods for this.

You could just check the length of the array that gets returned and the sort of response to 3 or 3 unique values in column 2.

In [57]:
df['col2'].unique()

array([444, 555, 666])

In [58]:
# number of unique values

df['col2'].nunique()

3

In [59]:
# Another way o count the unique values

len(df['col2'].unique())

3

This basically means if you want a table of the unique values and how many times they show up you can just use value_counts() as a method.

In [60]:
df['col2'].value_counts()

444    2
555    1
666    1
Name: col2, dtype: int64

### Selecting Data

In [61]:
#Select from DataFrame using criteria from multiple columns
newdf = df[(df['col1']>2) & (df['col2']==444)]

In [62]:
newdf

Unnamed: 0,col1,col2,col3
3,4,444,xyz


### Applying Functions

Now you know that you can already grab for instance a column and call a built in function off of it such as the sum and that return the sum of the column but what if you want to apply your own custom function such as times too well. Has the ability to do that.

You can just say the column name and then apply and then you just pass in whatever function you want to apply this case we're going to go ahead and apply times to and this will broadcast that function to each element in that column 2 4 6 8.

You can also apply built in functions such as length.

In [63]:
def times2(x):
    return x*2

In [64]:
df['col1'].apply(times2)

0    2
1    4
2    6
3    8
Name: col1, dtype: int64

In [65]:
df['col3'].apply(len)

0    3
1    3
2    3
3    3
Name: col3, dtype: int64

In [66]:
df['col1'].sum()

10

We were able to multiply each value in that column by two and we were able to do this without having to write an entire function for it.

We just did this as a lamb the expression and this is the sort of thing you're going to be using all the time as you get more and more comfortable with pandas.

This probably one of the most powerful features of pantless the ability to apply your own custom lambda expressions or functions.

In [67]:
df['col2'].apply(lambda x: x*2)

0     888
1    1110
2    1332
3     888
Name: col2, dtype: int64

**Permanently Removing a Column**

In [40]:
del df['col1']

In [41]:
df

Unnamed: 0,col2,col3
0,444,abc
1,555,def
2,666,ghi
3,444,xyz


**Get column and index names:**

In [68]:
# to grab the column names

df.columns

Index(['col1', 'col2', 'col3'], dtype='object')

In [69]:
# Information of the index 

df.index

RangeIndex(start=0, stop=4, step=1)

**Sorting and Ordering a DataFrame:**

In [71]:
df

Unnamed: 0,col1,col2,col3
0,1,444,abc
1,2,555,def
2,3,666,ghi
3,4,444,xyz


In [70]:
# Sorting the DF by col2

df.sort_values(by='col2') #inplace=False by default

Unnamed: 0,col1,col2,col3
0,1,444,abc
3,4,444,xyz
1,2,555,def
2,3,666,ghi


**Find Null Values or Check for Null Values**

In [46]:
df.isnull()

Unnamed: 0,col2,col3
0,False,False
1,False,False
2,False,False
3,False,False


In [47]:
# Drop rows with NaN Values
df.dropna()

Unnamed: 0,col2,col3
0,444,abc
1,555,def
2,666,ghi
3,444,xyz


**Filling in NaN values with something else:**

In [48]:
import numpy as np

In [49]:
df = pd.DataFrame({'col1':[1,2,3,np.nan],
                   'col2':[np.nan,555,666,444],
                   'col3':['abc','def','ghi','xyz']})
df.head()

Unnamed: 0,col1,col2,col3
0,1.0,,abc
1,2.0,555.0,def
2,3.0,666.0,ghi
3,,444.0,xyz


In [50]:
df.fillna('FILL')

Unnamed: 0,col1,col2,col3
0,1,FILL,abc
1,2,555,def
2,3,666,ghi
3,FILL,444,xyz


In [51]:
data = {'A':['foo','foo','foo','bar','bar','bar'],
     'B':['one','one','two','two','one','one'],
       'C':['x','y','x','y','x','y'],
       'D':[1,3,2,5,4,1]}

df = pd.DataFrame(data)

In [52]:
df

Unnamed: 0,A,B,C,D
0,foo,one,x,1
1,foo,one,y,3
2,foo,two,x,2
3,bar,two,y,5
4,bar,one,x,4
5,bar,one,y,1


In [53]:
df.pivot_table(values='D',index=['A', 'B'],columns=['C'])

Unnamed: 0_level_0,C,x,y
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,4.0,1.0
bar,two,,5.0
foo,one,1.0,3.0
foo,two,2.0,
