In [2]:
## Importing csv for summary of functions
import pandas as pd
p4k = pd.read_csv("p4kreviews.csv",encoding='latin1',index_col=0)

## pandas Overview 


Making the move from R to python, I feel out of place without my familiar tidyverse of packages for data maniuplation and visualization. As such, I've been spending a lot of time learning [pandas](https://pandas.pydata.org/), the most popular data analysis and manipulation tool in python. 

This post is meant to serve as an overview of pandas functionality as well as serve as a personal reference. To demonstrate pandas, I've chosen to use a Kaggle [dataset](https://www.kaggle.com/nolanbconaway/pitchfork-data) that compiles over 18k music reviews from the Pitchfork website. 

A copy of this .ipynb can be found on [here](https://github.com/rsolter/Udemy-Courses/blob/master/Udemy%20-%20Data%20Analysis%20with%20Pandas%20and%20Python/Summary.ipynb) my git repository for the Udemy pandas course





### Sections

1. [Inspecting a dataframe](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
2. [Selecting Columns or Rows from dataframe](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
3. [Adding or Deleting Columns and Rows](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
4. [Filtering](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
5. [Ranking & Sorting](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
6. [Working with NAs](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
7. [Unique and Duplicate values](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
8. [Working with Indexes](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
9. [Renaming Labels and Columns](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
10. [Sampling](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
11. [Grouping](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
12. [Dates and Times](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
13. [Text Functions](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
14. [Merging, Joining, and Concatenating](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)

**Some useful links:**

- [Official Pandas Documentation](https://pandas.pydata.org/)

- [Comparison with R/R libraries](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_r.html?highlight=arrange)

- [On Method Chaining in pandas](https://towardsdatascience.com/the-unreasonable-effectiveness-of-method-chaining-in-pandas-15c2109e3c69)

### Inspecting a dataframe

Below is a preview of the dataset which includes each album's score on a 10 point scale, artist name, album name, genre, review date, and text of the review. The best column refers to whether or not the album was designated a 'best new music' label.

key methods: .head(), .describe(), .info(), .shape, .dtypes, .columns(), .value_counts()

In [3]:
p4k.head() ## shows first five rows

Unnamed: 0,album,artist,best,date,genre,review,score
1,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5
3,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6
4,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
5,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7


In [4]:
p4k.describe().transpose() ## Provides a summary of quantitative columns, extra transpose() method chained along

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
best,19555.0,0.053183,0.224405,0.0,0.0,0.0,0.0,1.0
score,19555.0,7.027446,1.277544,0.0,6.5,7.3,7.8,10.0


In [5]:
p4k.info() ## Provides data type and non-null counts for each column 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 19555 entries, 1 to 19555
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   album   19550 non-null  object 
 1   artist  19555 non-null  object 
 2   best    19555 non-null  int64  
 3   date    19555 non-null  object 
 4   genre   19555 non-null  object 
 5   review  19554 non-null  object 
 6   score   19555 non-null  float64
dtypes: float64(1), int64(1), object(5)
memory usage: 1.2+ MB


In [6]:
p4k.shape ## Returns a tuple with dimensions of a dataset

(19555, 7)

In [7]:
p4k.dtypes ## Returns the data types for each column

album      object
artist     object
best        int64
date       object
genre      object
review     object
score     float64
dtype: object

In [8]:
p4k.columns ## Returns a list with the column names

Index(['album', 'artist', 'best', 'date', 'genre', 'review', 'score'], dtype='object')

### Selecting columns or rows from a dataframe

**Selecting columns** by name is done by passing the column(s) quoted name into brackets. 

In [9]:
p4k['album']

1                   A.M./Being There
2                           No Shame
3                   Material Control
4              Weighing of the Heart
5                        The Visitor
                    ...             
19551                           1999
19552                 Let Us Replay!
19553    Singles Breaking Up, Vol. 1
19554                    Out of Tune
19555      Left for Dead in Malaysia
Name: album, Length: 19555, dtype: object

In [10]:
p4k[['album','artist']]

Unnamed: 0,album,artist
1,A.M./Being There,Wilco
2,No Shame,Hopsin
3,Material Control,Glassjaw
4,Weighing of the Heart,Nabihah Iqbal
5,The Visitor,Neil Young / Promise of the Real
...,...,...
19551,1999,Cassius
19552,Let Us Replay!,Coldcut
19553,"Singles Breaking Up, Vol. 1",Don Caballero
19554,Out of Tune,Mojave 3


A range of columns can also be selected using the colon (:)

In [11]:
p4k.loc[:,'artist':]

Unnamed: 0,artist,best,date,genre,review,score
1,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5
3,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6
4,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
5,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7
...,...,...,...,...,...,...
19551,Cassius,0,January 26 1999,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8
19552,Coldcut,0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9
19553,Don Caballero,0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2
19554,Mojave 3,0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3


**Selecting rows** can be done with the .iloc() method which can be sliced with a colon (:)


In [12]:
# Returning the first row (as a pandas series)
p4k.iloc[0]

album                                      A.M./Being There
artist                                                Wilco
best                                                      1
date                                        December 6 2017
genre                                                  Rock
review    Best new reissue 1 / 2 Albums Newly reissued a...
score                                                     7
Name: 1, dtype: object

In [13]:
# Returning the first 10 rows (as a pandas dataframe)
p4k.iloc[0:9]

Unnamed: 0,album,artist,best,date,genre,review,score
1,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5
3,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6
4,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
5,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7
6,Perfect Angel,Minnie Riperton,1,December 5 2017,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0
7,Everyday Is Christmas,Sia,0,December 5 2017,Pop/R&B,Sias shiny Christmas album feels inconsistent...,5.8
8,Zaytown Sorority Class of 2017,Zaytoven,0,December 5 2017,Rap,The prolific Atlanta producer enlists 17 women...,6.2
9,Songs of Experience,U2,0,December 4 2017,Rock,"Years in the making, U2s 14th studio album fi...",5.3


Counting non-numeric data with **.value_counts()**

In [14]:
p4k['genre'].value_counts()
# p4k['genre'].value_counts()/p4k.shape[0] -- getting proportions instead of counts

Rock            6958
Electronic      4020
None            2324
Experimental    1699
Rap             1481
Pop/R&B         1157
Metal            781
Folk/Country     700
Jazz             257
Global           178
Name: genre, dtype: int64

### Adding or Deleting Columns and Rows

Adding a new column with a universal value:

In [15]:
p4k["Review Language"] = 'ENG'
p4k.head()

Unnamed: 0,album,artist,best,date,genre,review,score,Review Language
1,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
2,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
4,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
5,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG


Second method for adding column with loc

In [16]:
p4k.insert(loc=0,column="Parent Company", value="Vox")
p4k.head()

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1,Vox,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
2,Vox,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,Vox,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
4,Vox,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
5,Vox,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG


Adding a row with **append()**

In [17]:
new_review = {'Parent Company':'Vox','album':'Yandhi','artist':'Kanye West',
              'best':'1','date':'December 31 2050','genre':'Rap','review':'BEST.ALBUM.EVER',
              'score':10.0,'Review Language':'ENG'}

In [18]:
p4k.append(new_review,ignore_index=True) # ignore_index allows the new row(s) to be inserted seemlessly 

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
0,Vox,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
1,Vox,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
2,Vox,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
3,Vox,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
4,Vox,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG
...,...,...,...,...,...,...,...,...,...
19551,Vox,Let Us Replay!,Coldcut,0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG
19552,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG
19553,Vox,Out of Tune,Mojave 3,0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG
19554,Vox,Left for Dead in Malaysia,Neil Hamburger,0,January 5 1999,,Neil Hamburger's third comedy release is a des...,6.5,ENG


Dropping rows and columns with **.drop()**

In [68]:
p4k.drop(1) # returns a dataframe w/out the observation with index label '1' 

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
2,Vox,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG,19131.0
3,Vox,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG,13879.0
4,Vox,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG,5691.0
5,Vox,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG,13370.0
6,Vox,Perfect Angel,Minnie Riperton,1,December 5 2017,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0,ENG,425.0
...,...,...,...,...,...,...,...,...,...,...
19551,Vox,1999,Cassius,0,January 26 1999,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,ENG,18368.0
19552,Vox,Let Us Replay!,Coldcut,0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG,587.0
19553,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG,9807.0
19554,Vox,Out of Tune,Mojave 3,0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG,15110.0


In [69]:
p4k.drop("Parent Company", axis=1)
# or
p4k.drop("Parent Company", axis="columns")

Unnamed: 0,album,artist,best,date,genre,review,score,Review Language,score_rank
1,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG,11104.0
2,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG,19131.0
3,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG,13879.0
4,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG,5691.0
5,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG,13370.0
...,...,...,...,...,...,...,...,...,...
19551,1999,Cassius,0,January 26 1999,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,ENG,18368.0
19552,Let Us Replay!,Coldcut,0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG,587.0
19553,"Singles Breaking Up, Vol. 1",Don Caballero,0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG,9807.0
19554,Out of Tune,Mojave 3,0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG,15110.0


In [73]:
# multiple columns
p4k.drop(["Review Language","Parent Company","score_rank"], axis="columns",inplace=True)
p4k

KeyError: "['Review Language' 'Parent Company' 'score_rank'] not found in axis"

### Filtering

Filtering can be achieved with boolean conditions as well as the following methods: .isin(), .between(), .where(), .query().

**Boolean filtering** 

In [20]:
p4k[p4k["artist"]=="Prince"]

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1599,Vox,"One Nite Alone, The Aftershow: It Ain't Over!",Prince,1,September 1 2016,Pop/R&B,Best new reissue Originally released as part o...,8.6,ENG
2042,Vox,"Sign ""O"" the Times",Prince,0,April 30 2016,Pop/R&B,Choosing a single high point from Prince's glo...,10.0,ENG
2043,Vox,1999,Prince,0,April 30 2016,Pop/R&B,1999 is the greatest album ever made about par...,10.0,ENG
2047,Vox,Dirty Mind,Prince,0,April 29 2016,Pop/R&B,Princes first fully actualized album is an un...,10.0,ENG
2048,Vox,Controversy,Prince,0,April 29 2016,Pop/R&B,Controversy emerged in 1981 at a pivotal time ...,9.0,ENG
2444,Vox,HITNRUN Phase Two,Prince,0,January 8 2016,Pop/R&B,The second of Prince's HITNRUN series is anoth...,4.7,ENG
2779,Vox,HITNRUN Phase One,Prince,0,September 10 2015,Pop/R&B,"Prince's new effort, exclusive to Jay Z's Tida...",4.5,ENG
12334,Vox,Planet Earth,Prince,0,July 23 2007,Pop/R&B,So far this year Prince has wowed at the Super...,4.8,ENG
13400,Vox,Ultimate Prince,Prince,0,September 5 2006,Pop/R&B,This Prince best of covers the Warner Brothers...,8.6,ENG
13946,Vox,3121,Prince,0,March 20 2006,Pop/R&B,"On his latest release, the rock legend betters...",6.0,ENG


In [21]:
p4k[p4k["genre"] == "Global"] 
#p4k[p4k["genre"] != "Metal"] 

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
218,Vox,Ash,Ibeyi,1,October 4 2017,Global,"Best new music On their second album, the Fren...",8.3,ENG
238,Vox,La Confusion,Amadou & Mariam,0,September 29 2017,Global,The long-married Malian duo confronts the coun...,7.7,ENG
324,Vox,The Source,Tony Allen,0,September 6 2017,Global,"On this hybrid album of jazz and Afrobeat, lon...",7.7,ENG
355,Vox,Gypsy Woman,Joe Bataan,0,August 27 2017,Global,"Singer, songwriter, pianist, and bandleader Jo...",8.2,ENG
448,Vox,Sounds from the Other Side,WizKid,0,August 1 2017,Global,Emerging as a pop star from Nigerias Afrobeat...,7.4,ENG
...,...,...,...,...,...,...,...,...,...
18978,Vox,Liberation Afro Beat Vol. 1,Antibalas Afrobeat Orchestra,0,January 16 2001,Global,Music is a political statement. This fact is i...,6.0,ENG
19257,Vox,Living in the Flood,Horace Andy,0,March 31 2000,Global,"Alright, no fooling around. Anyone whose prima...",7.0,ENG
19261,Vox,Permutation,Bill Laswell,0,March 31 2000,Global,"If there's one thing Bill Laswell knows, it's ...",6.0,ENG
19276,Vox,Expensive Shit/He Miss Road,Fela Kuti,0,March 21 2000,Global,Afro-beat pioneer Fela Kuti was never more pis...,8.5,ENG


In [22]:
p4k[p4k["score"]>9.5]

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
14,Vox,Master of Puppets,Metallica,1,December 2 2017,Metal,"Best new reissue In 1986, Metallica released i...",10.0,ENG
128,Vox,I Can Hear the Heart Beating as One,Yo La Tengo,0,October 29 2017,Rock,"Twenty years on from its original release, Yo ...",9.7,ENG
154,Vox,The Queen Is Dead,The Smiths,0,October 22 2017,Rock,"Newly reissued as a boxed set, the Smiths 198...",10.0,ENG
506,Vox,Appetite for Destruction,Guns N' Roses,0,July 16 2017,Rock,The debut from Guns N' Roses was a watershed m...,10.0,ENG
566,Vox,Purple Rain Deluxe  Expanded Edition,Prince / The Revolution,1,June 26 2017,Pop/R&B,"Best new reissue In 1984, Purple Rain turned P...",10.0,ENG
...,...,...,...,...,...,...,...,...,...
19064,Vox,Kid A,Radiohead,0,October 2 2000,Rock,I had never even seen a shooting star before. ...,10.0,ENG
19172,Vox,The Moon & Antarctica,Modest Mouse,0,June 13 2000,Rock,It's not very exciting behind the scenes at Pi...,9.8,ENG
19236,Vox,Animals,Pink Floyd,0,April 25 2000,Rock,It begins somewhere for everyone. There's the ...,10.0,ENG
19383,Vox,Emergency & I,The Dismemberment Plan,0,September 30 1999,Rock,The Short Review:\r\nIf you consider yourself ...,9.6,ENG


In [23]:
condition1 = p4k["score"]>9.3
condition2 = p4k["genre"] == "Global"

p4k[condition1 & condition2] # filtering on both conditions

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1568,Vox,Caetano Veloso,Caetano Veloso,0,September 11 2016,Global,"In 1968, the Brazilian pop singer began a Trop...",9.4,ENG


In [24]:
p4k[condition1 | condition2] # filtering on either conditions

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
14,Vox,Master of Puppets,Metallica,1,December 2 2017,Metal,"Best new reissue In 1986, Metallica released i...",10.0,ENG
119,Vox,Walk Among Us,Misfits,0,October 31 2017,Metal,"They were outliers when they started, but by t...",9.4,ENG
128,Vox,I Can Hear the Heart Beating as One,Yo La Tengo,0,October 29 2017,Rock,"Twenty years on from its original release, Yo ...",9.7,ENG
154,Vox,The Queen Is Dead,The Smiths,0,October 22 2017,Rock,"Newly reissued as a boxed set, the Smiths 198...",10.0,ENG
218,Vox,Ash,Ibeyi,1,October 4 2017,Global,"Best new music On their second album, the Fren...",8.3,ENG
...,...,...,...,...,...,...,...,...,...
19473,Vox,Agaetis Byrjun,Sigur Rós,0,June 1 1999,Rock,Icelandic lore tells of the Hidden People who ...,9.4,ENG
19475,Vox,Livro,Caetano Veloso,0,June 1 1999,Global,I heard somewhere that a person's tastes chang...,9.0,ENG
19490,Vox,Mule Variations,Tom Waits,0,April 27 1999,Rock,I once took a poetry workshop taught by a guy ...,9.5,ENG
19520,Vox,Brand New Secondhand,Roots Manuva,0,March 23 1999,Electronic,"For politcially unaware, socially unconscious,...",9.5,ENG


The **.isin()** method

In [25]:
p4k[p4k["genre"].isin(["Global","Jazz"])] 

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
28,Vox,4444,Sam Gendel,0,November 29 2017,Jazz,Sam Gendels smoothly psychedelic debut is dom...,7.0,ENG
84,Vox,Tauhid/Jewels of Thought/Deaf Dumb Blind (Summ...,Pharoah Sanders,1,November 10 2017,Jazz,Best new reissue Best new reissue 1 / 3 Albums...,8.2,ENG
157,Vox,The Centennial Trilogy,Christian Scott aTunde Adjuah,0,October 21 2017,Jazz,1 / 3 Albums On the three albums that compose ...,7.6,ENG
178,Vox,The Magic City / My Brother the Wind Vol. 1,Sun Ra and His Arkestra,0,October 16 2017,Jazz,1 / 2 Albums Sun Ra manifested an ecstatic vis...,8.5,ENG
208,Vox,Dreams and Daggers,Cécile McLorin Salvant,0,October 7 2017,Jazz,The young jazz singers live double album show...,7.6,ENG
...,...,...,...,...,...,...,...,...,...
19261,Vox,Permutation,Bill Laswell,0,March 31 2000,Global,"If there's one thing Bill Laswell knows, it's ...",6.0,ENG
19276,Vox,Expensive Shit/He Miss Road,Fela Kuti,0,March 21 2000,Global,Afro-beat pioneer Fela Kuti was never more pis...,8.5,ENG
19280,Vox,Treader,Spring Heel Jack,0,March 21 2000,Jazz,Isn't it kind of alarming that the once cuttin...,5.4,ENG
19467,Vox,Interstellar Space Revisited: The Music of Joh...,Gregg Bendian / Nels Cline,0,June 8 1999,Jazz,A friend of mine once remarked about the later...,7.9,ENG


The **.between()** method

In [26]:
p4k[p4k["score"].between(3.4,3.6)] 

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
2,Vox,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
1200,Vox,December 99th,Yasiin Bey,0,January 2 2017,Rap,"On December 99th, Yasiin Bey (fka Mos Def) sou...",3.5,ENG
1348,Vox,Collage,The Chainsmokers,0,November 9 2016,Electronic,The massive lite-EDM duo the Chainsmokers port...,3.5,ENG
1775,Vox,New Introductory Lectures on the System of Tra...,Kel Valhaal,0,July 15 2016,Experimental,Liturgy frontman Hunter Hunt-Hendrix's latest ...,3.5,ENG
1808,Vox,Primary Colours,Magic!,0,July 6 2016,Pop/R&B,"After scoring a #1 hit with 2014's ""Rude,"" th...",3.5,ENG
...,...,...,...,...,...,...,...,...,...
18797,Vox,Bleed American,Jimmy Eat World,0,August 21 2001,Rock,Are you a 15-year-old TRL addict looking for a...,3.5,ENG
19072,Vox,Dream Signals in Full Circles,Tristeza,0,September 26 2000,Rock,"In the beginning, there was Nothing. And it wa...",3.5,ENG
19336,Vox,The Past Was Faster,Kelley Stoltz,0,December 14 1999,Rock,There's an upside and a downside to the perpet...,3.5,ENG
19394,Vox,Cobra and Phases Group Play Voltage in the Mil...,Stereolab,0,September 21 1999,Experimental,"""Okay, Brent, this is getting really old."" ""Wh...",3.4,ENG


The **.where()** method which returns the original dataframe with NAs in rows that don't meet the criteria

In [27]:
p4k.where(p4k["genre"]=="Rap")

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1,,,,,,,,,
2,Vox,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,,,,,,,,,
4,,,,,,,,,
5,,,,,,,,,
...,...,...,...,...,...,...,...,...,...
19551,,,,,,,,,
19552,,,,,,,,,
19553,,,,,,,,,
19554,,,,,,,,,


The **.query()** method is very readable, but since the conditions are wrapped in single quotes, it will not work if there are spaces in the column names

In [28]:
p4k.query('best==1.0')

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1,Vox,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
6,Vox,Perfect Angel,Minnie Riperton,1,December 5 2017,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0,ENG
14,Vox,Master of Puppets,Metallica,1,December 2 2017,Metal,"Best new reissue In 1986, Metallica released i...",10.0,ENG
19,Vox,Kick,INXS,1,December 1 2017,Rock,Best new reissue The 30th-anniversary edition ...,8.4,ENG
34,Vox,Utopia,Björk,1,November 27 2017,Electronic,"Best new music Filled with flute and birdsong,...",8.4,ENG
...,...,...,...,...,...,...,...,...,...
17501,Vox,Do You Party?,The Soft Pink Truth,1,February 4 2003,Electronic,Best new music There will be big fun in town t...,8.4,ENG
17511,Vox,You Forgot It in People,Broken Social Scene,1,February 2 2003,Rock,Best new music It's a bit late to be talking a...,9.2,ENG
17551,Vox,Everything Is Good Here/Please Come Home,The Angels of Light,1,January 20 2003,,"Best new music To a certain extent, most of us...",8.6,ENG
17554,Vox,Mount Eerie,The Microphones,1,January 20 2003,Experimental,Best new music Growing up in the shadow of Mt....,8.9,ENG


### Ranking & Sorting

Sorting of values is done with the **.sort_values()** method or **.sort_index()**

In [35]:
p4k.sort_values("score",
               ascending=False, # default is ascending=True
               na_position="first", # position of na values. options include "first","last"
               inplace=True) # argument to replace 'p4k' dataframe with sorted output

In [36]:
p4k.sort_values(['score','artist'],ascending=[False,False]) # multiple parameters to sort by

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1162,Vox,Germfree Adolescents,X-Ray Spex,0,January 15 2017,Rock,"X-Ray Spexs debut album is a brash, vivid mas...",10.0,ENG
13782,Vox,Pink Flag / Chairs Missing / 154,Wire,0,May 5 2006,Rock,1 / 3 Albums Wire were born at the dawn of pun...,10.0,ENG
6049,Vox,The Disintegration Loops,William Basinski,1,November 19 2012,Experimental,Best new reissue The four volumes of William B...,10.0,ENG
18222,Vox,Yankee Hotel Foxtrot,Wilco,0,April 21 2002,Rock,"Myth, it has been said, is the buried part of ...",10.0,ENG
1004,Vox,Weezer (Blue Album),Weezer,0,February 26 2017,Rock,"Weezers 1994 debut, filled with geeky humor, ...",10.0,ENG
...,...,...,...,...,...,...,...,...,...
15690,Vox,Travistan,Travis Morrison,0,September 27 2004,Pop/R&B,After a prestigious and fruitful career fronti...,0.0,ENG
19228,Vox,NYC Ghosts & Flowers,Sonic Youth,0,April 30 2000,Rock,"No, I have not forgotten to put the numbers in...",0.0,ENG
15048,Vox,Relaxation of the Asshole,Robert Pollard,0,April 20 2005,Rock,If more drunks would learn from Robert Pollard...,0.0,ENG
17050,Vox,Liz Phair,Liz Phair,0,June 24 2003,Rock,It could be said that Liz Phair's greatest ass...,0.0,ENG


In [40]:
p4k.sort_index(inplace=True) # reverting to original dataframe by sorting by index 
p4k

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1,Vox,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
2,Vox,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,Vox,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
4,Vox,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
5,Vox,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG
...,...,...,...,...,...,...,...,...,...
19551,Vox,1999,Cassius,0,January 26 1999,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,ENG
19552,Vox,Let Us Replay!,Coldcut,0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG
19553,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG
19554,Vox,Out of Tune,Mojave 3,0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG


Ranking is done with **.rank()** method, while individual extreme records can be returned with **.nlargest()** and **.nsmallest()** 

In [43]:
p4k['score'].rank()

1         7976.0
2          402.0
3         5494.0
4        13486.5
5         5932.0
          ...   
19551     1133.0
19552    18931.0
19553     9370.0
19554     4286.0
19555     5069.5
Name: score, Length: 19555, dtype: float64

By default, **.rank()** assign average rankings to values that are equivalent. Different ranking methods can be applied by using the method parameter:

In [52]:
p4k['score'].rank(method='min',ascending=False) # Min assigns lowest ranking for observations with the same values 

1         7500.0
2          379.0
3         5311.0
4        13108.0
5         5678.0
          ...   
19551     1078.0
19552    18893.0
19553     8991.0
19554     4126.0
19555     4829.0
Name: score, Length: 19555, dtype: float64

In [56]:
# Adding score_rank to the dataframe as a new column
p4k['score_rank']=p4k['score'].rank(method='min',ascending=False) 
p4k.sort_values("score_rank", ascending=True, na_position="first")  

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
9787,Vox,Abbey Road,The Beatles,0,September 10 2009,Rock,"The perfect ending to a recording career, this...",10.0,ENG,1.0
2280,Vox,Off the Wall,Michael Jackson,1,February 24 2016,Pop/R&B,Best new reissue Off the Wall is the sound of ...,10.0,ENG,1.0
7272,Vox,The Smile Sessions,The Beach Boys,1,November 2 2011,Rock,"Best new reissue Conceived, recorded, and ulti...",10.0,ENG,1.0
9784,Vox,The Stone Roses,The Stone Roses,1,September 11 2009,Rock,Best new reissue A badly needed remaster of th...,10.0,ENG,1.0
9261,Vox,Ladies and Gentlemen We are Floating in Space ...,Spiritualized,1,March 2 2010,Experimental,"Best new reissue This new deluxe, limited prod...",10.0,ENG,1.0
...,...,...,...,...,...,...,...,...,...,...
13301,Vox,Shine On,Jet,0,October 2 2006,Rock,,0.0,ENG,19550.0
19228,Vox,NYC Ghosts & Flowers,Sonic Youth,0,April 30 2000,Rock,"No, I have not forgotten to put the numbers in...",0.0,ENG,19550.0
12223,Vox,This Is Next,Various Artists,0,August 22 2007,,Basically an ADA sampler with a promotional st...,0.0,ENG,19550.0
17050,Vox,Liz Phair,Liz Phair,0,June 24 2003,Rock,It could be said that Liz Phair's greatest ass...,0.0,ENG,19550.0


In [65]:
p4k.nsmallest(n=4, columns="score")
# Alternative method which returns the records' index labels
#p4k["score"].nsmallest(n=4)

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
12223,Vox,This Is Next,Various Artists,0,August 22 2007,,Basically an ADA sampler with a promotional st...,0.0,ENG,19550.0
13301,Vox,Shine On,Jet,0,October 2 2006,Rock,,0.0,ENG,19550.0
15048,Vox,Relaxation of the Asshole,Robert Pollard,0,April 20 2005,Rock,If more drunks would learn from Robert Pollard...,0.0,ENG,19550.0
15690,Vox,Travistan,Travis Morrison,0,September 27 2004,Pop/R&B,After a prestigious and fruitful career fronti...,0.0,ENG,19550.0


In [67]:
#p4k.nlargest(n=4, columns="score")
# Alternative method which returns the records' index labels
p4k["score"].nlargest(n=4)

14     10.0
154    10.0
506    10.0
566    10.0
Name: score, dtype: float64

### Working with NAs


There are a variety of methods to identify, remove, and replace NA values

In [79]:
# adding a NA row to the data.frame
import numpy as np
NA_Row = {'album':np.nan,'artist':np.nan,'best':np.nan,'genre':np.nan,'review':np.nan,'score':np.nan}
NA_Row

{'album': nan,
 'artist': nan,
 'best': nan,
 'genre': nan,
 'review': nan,
 'score': nan}

In [84]:
p4k = p4k.append(NA_Row,ignore_index=True)
p4k

Unnamed: 0,album,artist,best,date,genre,review,score
0,A.M./Being There,Wilco,1.0,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
1,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5
2,Material Control,Glassjaw,0.0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6
3,Weighing of the Heart,Nabihah Iqbal,0.0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
4,The Visitor,Neil Young / Promise of the Real,0.0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7
...,...,...,...,...,...,...,...
19551,Let Us Replay!,Coldcut,0.0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9
19552,"Singles Breaking Up, Vol. 1",Don Caballero,0.0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2
19553,Out of Tune,Mojave 3,0.0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3
19554,Left for Dead in Malaysia,Neil Hamburger,0.0,January 5 1999,,Neil Hamburger's third comedy release is a des...,6.5


In [85]:

nba.dropna() # drops any value with a NA
nba.dropna(how="all", inplace=True) #only removes rows with NAs in all values

# Remove columns with any na values -- drops Salary and College
nba.dropna(axis=1)

# Only dropping observations where there are nulls in "Salary" column
nba.dropna(subset=["Salary"]) # Drops John Holland


nba.fillna(value=0) # will replace all NAs within the datafram with 0 (not ideal)



.isnull()

.notnull()

Unnamed: 0,album,artist,best,date,genre,review,score
0,A.M./Being There,Wilco,1.0,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
1,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5
2,Material Control,Glassjaw,0.0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6
3,Weighing of the Heart,Nabihah Iqbal,0.0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
4,The Visitor,Neil Young / Promise of the Real,0.0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7
...,...,...,...,...,...,...,...
19550,1999,Cassius,0.0,January 26 1999,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8
19551,Let Us Replay!,Coldcut,0.0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9
19552,"Singles Breaking Up, Vol. 1",Don Caballero,0.0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2
19553,Out of Tune,Mojave 3,0.0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3


### Unique and Duplicate values


In [None]:
.duplicated()
df["First Name"].duplicated(keep="first") # by default, only the first observation of a value is not seen as a duplicate
df["First Name"].duplicated(keep="last") # running from the bottom up
df["First Name"].duplicated(keep= False) # identifys all values that are duplciated
df[df["First Name"].duplicated(keep= False)] # keeping any values that are duplicated
df[~df["First Name"].duplicated(keep=False)] # '~' negates to get all values which have no duplicates


len(df.drop_duplicates()) ## length is the same, because duplicates are identified across all columns
df.drop_duplicates(subset=["First Name"], keep = "first") # only dropping duplicates in the Names column, keeping first obs
df.drop_duplicates(subset=["First Name"], keep = False) # dropping all duplicated names
df.drop_duplicates(subset = ["Team"], keep=False) # each team has more than one row, so empty df is returned
df.drop_duplicates(subset = ["Team", "First Name"]) # Can drop duplicates across multiple columns



In [None]:
# unique() returns an array of unique values in a series

df["Gender"].unique() # returns unique values
df["Team"].unique()
len(df["Team"].unique())
df.nunique() # counts unique values

## The two methods give different counts because .nunique() drops NAs by default
df.nunique(dropna=False) # counts unique values without NAs



### Working with Indexes


### Renaming Labels and Columns


### Sampling


### Grouping


### Data and Times


### Text Functions


### Merging, Joining, and Concatenating


### 1.Series

_A note on attributes and methods:_
An attribute is something that bound to an object, while a method is a procedure or action. Also, attributes have no parantheses, attributes require them
    
#### Series attributes and methods, explanations where necessary:
    
   - series.head
   - series.tail
   - len(series) - _Return length of series including NA/null observations_
   - sorted(series) - _Sorts values_
   - list(series) - _Converts series to a list_
   - dict(series) - _Turns the series into a dictionary object where the the existing index becomes the dictionary key_
   - min(series) -_For strings, will return first value sorted alphabetically_
   - max(series) - _For strings, will return last value sorted alphabetically _
   - series.values - _values attribute_
   - series.index - _values attribute_
   - series.dtype - _data type_
   - series.is_unique - _Returns unique values_
   - series.shape - _dimensions of series/dataframe_
   - series.size - _number of elements (rows*columns)_
   - series.count() - _Returns number of non-NA/null observations_
   - series.name - _name of the series_
   - series.sort_values(inplace=T) - _sorts values, inplace=T replaces original values with sorted ones_
   - series.sort_index(inplace=T) - _sorts index, inplace=T replaces original values with sorted ones_
   - "Value" in series 
   - series['n'] - _returns nth element by index_
   - series['index label'] - _returns element by index value name_
   - series.sum()
   - series.mean()
   - series.std()
   - series.min()
   - series.max()
   - series.median()
   - series.mode()
   - series.describle() - _Similar to summary() in R, returns key descriptive stats_
   - series.idmax() - _Return the row label of the maximum value._
   - series.idmin() - _Return the row label of the minimum value._
   - series.value_counts() - _Similar to table() in base R_
    
#### Apply method - invokes a function on a series of values

   - [Documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html)
   - series.apply(FUNCTION, args(,.. additional arguments)


#### lambda
   - [Explanation](https://stackabuse.com/lambda-functions-in-python/)
   - In Python, the lambda keyword declares an anonymous (no name) function, which are referred to as "lambda functions". Although syntactically they look different, lambda functions behave in the same way as regular functions that are declared using the def keyword.
    

#### .map()
    - .map()

In [41]:
range(12)

range(0, 12)

In [42]:
artists = p4k["artist"]
scores = p4k["score"]
p4k.head()

Unnamed: 0,album,artist,best,date,genre,review,score
1,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5
3,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6
4,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
5,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7


In [43]:
## Methods and Attributes

artists.count() 
artists.value_counts() 
artists.head
artists.tail
len(artists)
sorted(artists)  
list(artists) 
dict(artists) 
min(artists)  
max(artists) 
artists.values 
artists.index
artists.dtype
artists.is_unique 
artists.shape 
artists.size 
artists.name 
artists.sort_values()  
artists.sort_index() 
"David Bowie" in artists # in operator
artists[100] 
#artists['David Bowie'] 
scores.sum()
scores.mean()
scores.std()
scores.min()
scores.max()
scores.median()
scores.mode()
scores.describe() 
scores.idxmax("Score") 
scores.idxmin("Score") 

12223

In [44]:
## Apply Method - invokes a function on a series of values

# returns nth character of each artist name, with index starting at 0
def n_char(string,n):
    if len(string)<n+1:
        return ''
    else:
        return(string[n])
    
 
## Returning character from artist string at positiong 3: 
artists.apply(n_char, args=(3,))



1        c
2        s
3        s
4        i
5        l
        ..
19551    s
19552    d
19553     
19554    a
19555    l
Name: artist, Length: 19555, dtype: object

In [45]:
## Lambda - 
artists.apply(lambda x: x[0])

1        W
2        H
3        G
4        N
5        N
        ..
19551    C
19552    C
19553    D
19554    M
19555    N
Name: artist, Length: 19555, dtype: object

### 2. Data Frames

#### Basic Information
   - df.shape
   - df.dtypes
   - df.columns
   - df.axes
   - df.info
   - df.sum( , axes={1,0})
    
#### Selecting column(s)
    
   - df["c1"] or df.c1, df[["c1","c2"]]
    
#### Adding a new column
    
   - df["newCol"] = {value}
 
#### Broadcasting Operations
   - df[value].add(5) or df[value] + 5 (accounts for NAs)
   - df[value].mul(3) 

#### Dropping Rows with Null Values  
  
   - df.dropna() - _drops any observations with an NA values. Similar to R's complete.observations_
   - df.dropna(how="all") - _only drops rows with all NA values_
   - df.dropna(axis=1) - _drops columns with any NA values_
    
   - df.fillna(value=0) - _fills all values in the dataframe_
   - df["column1"].fillna(0,inplace=True) - _column by column approach_
    
    
#### Converting Types using as.type() method
   - df["Float_Score"].astype("int") - _converts FLOAT to INT. Note that there is not inplace arg_
   - as.type("category") - _can be used to convert a string to a R factor-like variable. Saves space._  
    
#### Sorting/Ranking Values
   - df.sort_values([Co1],[Col2], ascending=[True,False])
   - df.rank() - _provides rankings as integers_
    
#### Filtering based upon a condition
   - df["Col1"]=="Value" or df["Col1"]<=22 will return a boolean
   - df[df["Col1"]=="Value"] will return a filtered dataset
   - Alternatively, filter1 = df["Col1"]=="Value", df[filter1]
   - Conditions can be strung together with AND (&), OR (|)
    
#### .isin() Method

   - df["Col1].isin(["Value1","Value2"]) can be used to filter/extract rows in a dataframe
    
#### .isnull(), .notnull() Methods
   - df["Col1"].isnull() - _produces a boolean series where Col1 value is null_
   - df["Col1"].notnull() - _produces a boolean series where Col1 value is NOT null_

#### .between() Method
   - df["Col1"].between(200,300) - _returns a boolean series of observations falling between 200 and 300, inclusive. Works on times, dates, and numerics_
   
#### .duplicated() Method
   - df["Col1"].duplicate(keep="first") - _Idenifies duplicates and removes them, by default keeps the first observation. keep=False will return all observations that have duplicates_
   
#### .drop_duplicates() Method
   - df.drop_duplicate() - _Applies to a df across all columns, where as the .duplicated() method above applies to a series._
   - df.drop_duplicates(subset=["Col1"], keep = "first") - Can be applied to specific columns_
   
#### .unique() Method
   - df["Col1"].unique() - counts unique values for one column

#### .nunique() Method
   - df.nunique() - counts unique values across columns
 
#### .set_index() Method
   - df.set_index("Col1") - _replaces existing index with values from a column_
   
#### .reset_index() Method
   - df.reset_index(drop=True) - _resets index and drops values_
   - df.reset_index(drop=True) - _getting back to original_
   
#### Retrieving Rows by Index Label with .loc()
   - .loc uses brackets, parantheses 
   - df.loc["indexLabel"] - Retrieves row with specific index label
   
#### Retrieving Rows by Index Position with .loc()
   - df.iloc[100] - _retrieves row with specific index number(s)_
   - df.iloc[60:120] - _retrieving a range_
   - df.iloc[12,1:3] - _retrieving a certain row, multiple columns_

#### Identifying Individual cells, setting new values
   - df.iloc[0, 1] == "New Value" 
   - df.iloc[2, 0:] == "New row value"

#### Renaming Index Labels or Columns in a Dataframe
   - df.rename(columns = {"Col1" : "NewCol1", "Col2" : "NewCol2"}, inplace=T) - _renaming of columns are done with a dictionary 


#### Deleting Rows or Columns from a Dataframe
   - dr.drop["Row1", axis=0] - _drops a row by name_
   - df.drop("Col1", axis=1) - _drop a column_
   - del df["Col1"] - _alternative method_
   
#### Random Samples with .sample() Method
   - df.sample(n) - _sample n random rows_
   - df.sample(frac=0.25) - _samples a random 25%_
   - df.sample(axis=) - _can sample rows or cols with axis_
   
#### The .nsmallest() and .nlargest() Methods
   - df.nsmallest(n=3, columns="Col1 ) - _returns 3 smallest values for Col1_
   - df.nlargest(n=3, columns="Col1 ) - _returns 3 largest values for Col1_
   
   
#### Filtering with the .where() Method()
   - df.where(df[Col1]=="Value") - _returns the original data frame with NAS in rows that don't meet the filtering criteria_
   
#### The .query() Method
   - df.query('Col'=="Value") - _Similar to filter, only returns matching rows_
   

#### .copy() Method
   - df.copy() - _create a copy of the object's indices and data_
   
   

### 3. Strings

#### Common methods   
   - string.lower()
   - string.upper()
   - string.title() - _Capitalizes first letter of each word_
   - len(string)
   - string.strip() - _Strips white space_
   - string.lstrip() - _Strips white space on the left_
   - string.rstrip() - _Strips white space on the right_
   

#### .str.replace() method
   - "Hello world".replace("l","!") - _Two arguments: pattern, substitute_
   
   
#### Filtering with string methods   
   - df["Col1"].str.lower().str.contains("water") - _Searches Col1 for strings that contain 'water'_
   - Other alternate searches: str.startswith(), str.endswith()

#### Splitting strings by characters 
   - "Hello my name is Ravi".split(" ") # single arg is the delimiter/sep
   - **Expand parameter**: df[["First Name", "Last Name"]] = df["Name"].str.split(",", expand=True) - _Breaks apart Name into first and last name columns_
   - **n parameter** - _n equals the maximum number of splits_



### 4. Multi Index
   
#### Creating a multi-index with set_index() 
   - df.set_index(keys=["Col1","Col2"], inplace=True) - _Creates multi-level index_
    
#### The .get_level_values() Method
   - df.index.get_level_values() - _returns index values_

#### The .set_names() Method
   - df.index_set_names(["Name1","Name2"]) - _renames index levels_
   
#### The sort_index() Method
   - df.sort_index(ascending=[True,False]) - _sorts indexes_

#### The .transpose() method and MultiIndex on Column Level
   - dfT = df.transpose() - _transposes data.frame, including indexes_ 

#### The .swaplevel() Method
   - df.swaplevel() - _swamps levels of multi-index_

#### The .stack() Method
   - Similar to R's tidy::gather()
   - [Official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html)

#### The .unstack() Method 
   - Similar to R's tidyr::spread()
   - 

#### The .pivot() Method
   - df.pivot(index="Col1", columns"Col2", values="Col3") - _Returns reshaped DataFrame organized by given index/column names_ 
   - [Official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html)

#### The .pivot_table() Method
   - df.pivot_table(index="Col1", columns"Col2", values="Col3", aggfunc="mean") - _Create a spreadsheet-style pivot table as a DataFrame_
   - [Official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html)

#### The pd.melt() Method
   - Essentially the inverse of pivot_table. Converting into a longer table
   - [Official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.melt.html)


### 5. Group by

#### The pd.melt() Method
   - 
   - 

### 6. Merging, Joining, Concatenating
   - 
   - 
   - 
   - 
   - 

### 7. Merging, Joining, Concatenating  
   - 
   - 
   - 
   - 
   - 