# Operations

There are lots of operations with pandas that will be really useful to you, but we are gonna cover a few important ones, feel free to check out other functions as well.

In [52]:
# recreate the dataframe

Unnamed: 0,col1,col2,col3
0,1,444,abc
1,2,555,def
2,3,666,ghi
3,4,444,xyz


In [75]:
import pandas as pd

df = pd.DataFrame({'col1': [1,2,3,4],
                             'col2': [444,555,666,444],
                             'col3': ['abc','def','ghi','xyz']})
df

Unnamed: 0,col1,col2,col3
0,1,444,abc
1,2,555,def
2,3,666,ghi
3,4,444,xyz


### Info on Unique Values

In [53]:
# Get the unique items in col2

array([444, 555, 666])

In [4]:
df.col2.unique()

array([444, 555, 666], dtype=int64)

In [54]:
# Get the number of unique items in col2

3

In [5]:
df.col2.nunique()

3

In [55]:
# Get the count of values

444    2
555    1
666    1
Name: col2, dtype: int64

In [9]:
df.col2.value_counts()

444    2
555    1
666    1
Name: col2, dtype: int64

### Selecting Data

In [31]:
# Select from DataFrame using criteria from multiple columns
# Condition -> df['col1']>2 and df['col2'] == 444

newdf = df[(df['col1']>2) & (df['col2'] ==444)]
newdf

Unnamed: 0,col1,col2,col3
3,4,444,xyz


In [57]:
newdf

Unnamed: 0,col1,col2,col3
3,4,444,xyz


### Applying Functions

In [58]:
# Create your custom square function

In [33]:
dfsq = df['col1'].apply(lambda x: x * 2)
print(dfsq)

0    2
1    4
2    6
3    8
Name: col1, dtype: int64


In [59]:
# Apply your function to col1

0    2
1    4
2    6
3    8
Name: col1, dtype: int64

In [60]:
# Get the length of strings in col3 using a builtin function

0    3
1    3
2    3
3    3
Name: col3, dtype: int64

In [38]:
dflen = df['col3'].apply(lambda x: len(x))
dflen

0    3
1    3
2    3
3    3
Name: col3, dtype: int64

In [76]:
# same, with built in function
df['col3'].str.len()

0    3
1    3
2    3
3    3
Name: col3, dtype: int64

In [61]:
# Get the sum of values in col1 using a builtin function

10

In [79]:
df['col1'].sum()

10

** Permanently Removing a Column**

In [62]:
# Delete a pandas column using a builtin column

In [43]:
del df['col1']
df

Unnamed: 0,col2,col3
0,444,abc
1,555,def
2,666,ghi
3,444,xyz


In [63]:
df

Unnamed: 0,col2,col3
0,444,abc
1,555,def
2,666,ghi
3,444,xyz


** Get column and index names: **

In [64]:
# Recreate

Index(['col2', 'col3'], dtype='object')

In [52]:
df.dtypes

col2     int64
col3    object
dtype: object

In [65]:
# Recreate

RangeIndex(start=0, stop=4, step=1)

In [67]:
df.columns


Index(['col2', 'col3'], dtype='object')

** Sorting and Ordering a DataFrame:**

In [66]:
df

Unnamed: 0,col2,col3
0,444,abc
1,555,def
2,666,ghi
3,444,xyz


In [58]:
df.sort_values(by=['col2'])

Unnamed: 0,col2,col3
0,444,abc
3,444,xyz
1,555,def
2,666,ghi


In [67]:
# Sort dataframe with values of col2
# inplace=False by default

Unnamed: 0,col2,col3
0,444,abc
3,444,xyz
1,555,def
2,666,ghi


** Find Null Values or Check for Null Values**

In [68]:
# Your code goes here

Unnamed: 0,col2,col3
0,False,False
1,False,False
2,False,False
3,False,False


In [61]:
df.isnull()

Unnamed: 0,col2,col3
0,False,False
1,False,False
2,False,False
3,False,False


In [69]:
# Drop rows with NaN Values

Unnamed: 0,col2,col3
0,444,abc
1,555,def
2,666,ghi
3,444,xyz


In [64]:
df.dropna(inplace=True)
df

Unnamed: 0,col2,col3
0,444,abc
1,555,def
2,666,ghi
3,444,xyz


# Great Job!