# &#9657; Create Test Objects

#### 5 columns and 20 rows of random floats
```python
pd.DataFrame(np.random.rand(20,5)) 
```
#### Create a series from an iterable my_list
```python
pd.Series(my_list)
```
#### Add a date index
```python
df.index = pd.date_range('1900/1/30', periods=df.shape[0])	
```

# &#9657; Viewing/Inspecting Data

#### First n rows of the DataFrame
```python
df.head(n)	
```
#### Last n rows of the DataFrame
```python
df.tail(n)	
```
#### Number of rows and columns
```python
df.shape	
```
#### Index, Datatype and Memory information
```python
df.info()	
```
#### Summary statistics for numerical columns
```python
df.describe()	
```
#### View unique values and counts
```python
s.value_counts(dropna=False)	
```
#### Unique values and counts for all columns
```python
df.apply(pd.Series.value_counts)	
```

# &#9657; Selection

#### Returns column with label col as Series
```python
df[col]	
```
#### Returns columns as a new DataFrame
```python
df[[col1, col2]]	
```
#### Selection by position
```python
s.iloc[0]	
```
#### Selection by index
```python
s.loc['index_one']	
```
#### First row
```python
df.iloc[0,:]	
```
#### First element of first column
```python
df.iloc[0,0]	
```

# &#9657; Data Cleaning

#### Rename columns
```python
df.columns = ['a','b','c']	
```
#### Checks for null Values, Returns Boolean Arrray
```python
pd.isnull()	
```
#### Opposite of pd.isnull()
```python
pd.notnull()	
```
#### Drop all rows that contain null values
```python
df.dropna()
```
#### Drop all columns that contain null values
```python
df.dropna(axis=1)	
```
#### Drop all rows have have less than n non null values
```python
df.dropna(axis=1,thresh=n)	
```
#### Replace all null values with x
```python
df.fillna(x)	
```
#### Replace all null values with the mean
```python
s.fillna(s.mean())	
```
#### Convert the datatype of the series to float
```python
s.astype(float)	
```
#### Replace all values equal to 1 with 'one'
```python
s.replace(1,'one')	
```
#### Replace all 2 with 'two' and 3 with 'three'
```python
s.replace([2,3],['two', 'three'])	
```
#### Mass renaming of columns
```python
df.rename(columns=lambda x: x + 1)	
```
#### Selective renaming
```python
df.rename(columns={'old_name': 'new_ name'})	
```
#### Change the index
```python
df.set_index('column_one')	
```
#### Mass renaming of index
```python
df.rename(index=lambda x: x + 1)	
```

# &#9657; Filter, Sort, and Groupby

#### Rows where the column col is greater than 0.6
```python
df[df[col] > 0.6]	
```
#### Rows where 0.8 > col > 0.6
```python
df[(df[col] > 0.6) & (df[col] < 0.8)]	
```
#### Sort values by col1 in ascending order
```python
df.sort_values(col1)	
```
#### Sort values by col2 in descending order.5
```python
df.sort_values(col2,ascending=False)	
```
#### Sort values by col1 in ascending order then col2 in descending order
```python
df.sort_values([col1,col2],ascending=[True,False])	
```
#### Returns a groupby object for values from one column
```python
df.groupby(col)	
```
#### Returns groupby object for values from multiple columns
```python
df.groupby([col1,col2])	
```
#### Returns the mean of the values in col2, grouped by the values in col1
```python
df.groupby(col1)[col2]	
```
#### Create a pivot table that groups by col1 and calculates the mean of col2 and col3
```python
df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean)	
```
#### Find the average across all columns for every unique col1 group
```python
df.groupby(col1).agg(np.mean)	
```
#### Apply the function np.mean() across each column
```python
df.apply(np.mean)	
```
#### Apply the function np.max() across each row
```python
nf.apply(np.max,axis=1)	
```

# &#9657; Join/Combine

#### Add the rows in df1 to the end of df2 (columns should be identical)
```python
df1.append(df2)	
```
#### Add the columns in df1 to the end of df2 (rows should be identical)
```python
pd.concat([df1, df2],axis=1)	
```
#### SQL-style join the columns in df1 with the columns on df2 where the rows for col have identical values. The 'how' can be 'left', 'right', 'outer' or 'inner'
```python
df1.join(df2,on=col1, how='inner')	
```

# &#9657; Statistics 


#### Summary statistics for numerical columns
```python
df.describe()	
```
#### Returns the mean of all columns
```python
df.mean()	
```
#### Returns the correlation between columns in a DataFrame
```python
df.corr()	
```
#### Returns the number of non-null values in each DataFrame column
```python
df.count()	
```
#### Returns the highest value in each column
```python
df.max()	
```
#### Returns the lowest value in each column
```python
df.min()	
```
#### Returns the median of each column
```python
df.median()	
```
#### Returns the standard deviation of each column
```python
df.std()	
```

***

# &#9657; Python | Pandas DataFrame.where()

#### Example #1: Single Condition operation

* In this example, rows having particular Team name will be shown and rest will be replaced by NaN using .where() method.

```python
# importing pandas package 
import pandas as pd 

# making data frame from csv file 
data = pd.read_csv("nba.csv") 

# sorting dataframe 
data.sort_values("Team", inplace = True) 

# making boolean series for a team name 
filter = data["Team"]=="Atlanta Hawks"

# filtering data 
data.where(filter, inplace = True) 

# display 
data 
```


As shown in the output image, every row which doesn’t have Team = Atlanta Hawks is replaced with NaN.

![Temp.JPG](attachment:Temp.JPG)

### Example #2: Multi-condition Operations

* Data is filtered on the basis of both Team and Age. Only the rows having Team name “Atlanta Hawks” and players having age above 24 will be displayed.


```python
# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv") 
  
# sorting dataframe 
data.sort_values("Team", inplace = True) 
  
# making boolean series for a team name 
filter = data["Team"]=="Atlanta Hawks"
  
# filtering data 
data.where(filter, inplace = True) 
  
# display 
data 
```

##### Output:
As shown in the output image, Only the rows having Team name “Atlanta Hawks” and players having age above 24 are displayed.

![Temp.JPG](attachment:Temp.JPG)