## pandas Overview 


Making the move from R to python, I feel out of place without my familiar tidyverse of packages for data maniuplation and visualization. As such, I've been spending a lot of time learning [pandas](https://pandas.pydata.org/), the most popular data analysis and manipulation tool in python. 

This post is meant to serve as an overview of pandas functionality as well as serve as a personal reference. To demonstrate pandas, I've chosen to use a Kaggle [dataset](https://www.kaggle.com/nolanbconaway/pitchfork-data) that compiles over 18k music reviews from the Pitchfork website. 

A copy of this .ipynb can be found on [here](https://github.com/rsolter/Udemy-Courses/blob/master/Udemy%20-%20Data%20Analysis%20with%20Pandas%20and%20Python/Summary.ipynb) my git repository for the Udemy pandas course



#### Importing data

In [2]:
import pandas as pd
import numpy as np
import datetime as dt

## Importing csv for summary of functions
import pandas as pd
p4k = pd.read_csv("p4kreviews.csv",
                  encoding='latin1',
                  index_col=0, # no index set, can specify id column if it exists
                  parse_dates=["date"]) # pandas is very good at recognizing dates upon import



## Sections

1. [Inspecting a dataframe](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
2. [Selecting Columns or Rows from dataframe](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
3. [Adding or Deleting Columns and Rows](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
4. [Filtering](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
5. [Ranking & Sorting](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
6. [Working with NAs](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
7. [Unique and Duplicate values](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
8. [Working with Indexes](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
9. [Renaming Labels and Columns](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
10. [Sampling](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
11. [Grouping](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
12. [Dates and Times](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
13. [Text Functions](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
14. [Merging, Joining, and Concatenating](https://rsolter.github.io/python/pandas/reference/Pandas-Overview/)
15. [Reshaping dataframes]


**Some useful links:**

- [Official Pandas Documentation](https://pandas.pydata.org/)

- [Comparison with R/R libraries](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_r.html?highlight=arrange)

- [On Method Chaining in pandas](https://towardsdatascience.com/the-unreasonable-effectiveness-of-method-chaining-in-pandas-15c2109e3c69)

### 1. Inspecting a dataframe

_key methods: .head(), .describe(), .info(), .shape, .dtypes, .columns(), .value_counts()_

Below is a preview of the dataset which includes each album's score on a 10 point scale, artist name, album name, genre, review date, and text of the review. The best column refers to whether or not the album was designated a 'best new music' label.



In [3]:
p4k.head() ## shows first five rows

Unnamed: 0,album,artist,best,date,genre,review,score
1,A.M./Being There,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2,No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5
3,Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6
4,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
5,The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7


In [4]:
p4k.describe().transpose() ## Provides a summary of quantitative columns, extra transpose() method chained along

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
best,19555.0,0.053183,0.224405,0.0,0.0,0.0,0.0,1.0
score,19555.0,7.027446,1.277544,0.0,6.5,7.3,7.8,10.0


In [5]:
p4k.info() ## Provides data type and non-null counts for each column 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 19555 entries, 1 to 19555
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   album   19550 non-null  object        
 1   artist  19555 non-null  object        
 2   best    19555 non-null  int64         
 3   date    19555 non-null  datetime64[ns]
 4   genre   19555 non-null  object        
 5   review  19554 non-null  object        
 6   score   19555 non-null  float64       
dtypes: datetime64[ns](1), float64(1), int64(1), object(4)
memory usage: 1.2+ MB


In [6]:
p4k.columns ## Returns a list with the column names
p4k.dtypes ## Returns the data types for each column
p4k.shape ## Returns a tuple with dimensions of a dataset

(19555, 7)

### 2. Selecting columns or rows from a dataframe

_key methods: .loc(), .iloc(), .value_counts()_

**Selecting columns** by name is done by passing the column(s) quoted name into brackets. 

In [7]:
p4k['album'] # selecting a single column
p4k[['album','artist']] # selecting multiple columns

Unnamed: 0,album,artist
1,A.M./Being There,Wilco
2,No Shame,Hopsin
3,Material Control,Glassjaw
4,Weighing of the Heart,Nabihah Iqbal
5,The Visitor,Neil Young / Promise of the Real
...,...,...
19551,1999,Cassius
19552,Let Us Replay!,Coldcut
19553,"Singles Breaking Up, Vol. 1",Don Caballero
19554,Out of Tune,Mojave 3


In [8]:
p4k.loc[:,'artist':] # A range of columns can also be selected using the colon (:)

Unnamed: 0,artist,best,date,genre,review,score
1,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5
3,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6
4,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
5,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7
...,...,...,...,...,...,...
19551,Cassius,0,1999-01-26,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8
19552,Coldcut,0,1999-01-26,Electronic,The marketing guys of yer average modern megac...,8.9
19553,Don Caballero,0,1999-01-12,Experimental,"Well, kids, I just went back and re-read my re...",7.2
19554,Mojave 3,0,1999-01-12,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3


**Selecting rows** can be done with the .iloc() method which can be sliced with a colon (:)


In [9]:
p4k.iloc[0] # Returning the first row (as a pandas series)
p4k.iloc[0:9] # Returning the first 10 rows (as a pandas dataframe)

Unnamed: 0,album,artist,best,date,genre,review,score
1,A.M./Being There,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2,No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5
3,Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6
4,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
5,The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7
6,Perfect Angel,Minnie Riperton,1,2017-12-05,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0
7,Everyday Is Christmas,Sia,0,2017-12-05,Pop/R&B,Sias shiny Christmas album feels inconsistent...,5.8
8,Zaytown Sorority Class of 2017,Zaytoven,0,2017-12-05,Rap,The prolific Atlanta producer enlists 17 women...,6.2
9,Songs of Experience,U2,0,2017-12-04,Rock,"Years in the making, U2s 14th studio album fi...",5.3


Counting non-numeric data with **.value_counts()**

In [14]:
p4k['genre'].value_counts() 
# p4k['genre'].value_counts()/p4k.shape[0] -- getting proportions instead of counts

Rock            6958
Electronic      4020
None            2324
Experimental    1699
Rap             1481
Pop/R&B         1157
Metal            781
Folk/Country     700
Jazz             257
Global           178
Name: genre, dtype: int64

### 3. Adding or Deleting Columns and Rows

_key methods: .insert(), .append(), .drop()_

In [146]:
p4k["Review Language"] = 'ENG' # Adding a new column with a universal value:
p4k.head()

Unnamed: 0,album,artist,best,date,genre,review,score,Review Language
1,A.M./Being There,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
2,No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
4,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
5,The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG


In [147]:
p4k.insert(loc=0,column="Parent Company", value="Vox") # Second method for adding column with loc
p4k.head()

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1,Vox,A.M./Being There,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
2,Vox,No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,Vox,Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
4,Vox,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
5,Vox,The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG


Adding a row with **append()**

In [148]:
# creating a new, fake review
new_review = {'Parent Company':'Vox','album':'Yandhi','artist':'Kanye West',
              'best':'1','date':'December 31 2050','genre':'Rap','review':'BEST.ALBUM.EVER',
              'score':10.0,'Review Language':'ENG'}

In [149]:
p4k.append(new_review,ignore_index=True) # ignore_index allows the new row(s) to be inserted seemlessly 

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
0,Vox,A.M./Being There,Wilco,1,2017-12-06 00:00:00,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
1,Vox,No Shame,Hopsin,0,2017-12-06 00:00:00,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
2,Vox,Material Control,Glassjaw,0,2017-12-06 00:00:00,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
3,Vox,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06 00:00:00,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
4,Vox,The Visitor,Neil Young / Promise of the Real,0,2017-12-05 00:00:00,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG
...,...,...,...,...,...,...,...,...,...
19551,Vox,Let Us Replay!,Coldcut,0,1999-01-26 00:00:00,Electronic,The marketing guys of yer average modern megac...,8.9,ENG
19552,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0,1999-01-12 00:00:00,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG
19553,Vox,Out of Tune,Mojave 3,0,1999-01-12 00:00:00,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG
19554,Vox,Left for Dead in Malaysia,Neil Hamburger,0,1999-01-05 00:00:00,,Neil Hamburger's third comedy release is a des...,6.5,ENG


Dropping rows and columns with **.drop()**

In [150]:
p4k.drop(1) # returns a dataframe w/out the observation with index label '1' 

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
2,Vox,No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,Vox,Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
4,Vox,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
5,Vox,The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG
6,Vox,Perfect Angel,Minnie Riperton,1,2017-12-05,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0,ENG
...,...,...,...,...,...,...,...,...,...
19551,Vox,1999,Cassius,0,1999-01-26,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,ENG
19552,Vox,Let Us Replay!,Coldcut,0,1999-01-26,Electronic,The marketing guys of yer average modern megac...,8.9,ENG
19553,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0,1999-01-12,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG
19554,Vox,Out of Tune,Mojave 3,0,1999-01-12,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG


In [151]:
p4k.drop("Parent Company", axis=1)
# or
p4k.drop("Parent Company", axis="columns")

# multiple columns
# p4k.drop(["Parent Company","date"], axis="columns")

Unnamed: 0,album,artist,best,date,genre,review,score,Review Language
1,A.M./Being There,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
2,No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
4,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
5,The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG
...,...,...,...,...,...,...,...,...
19551,1999,Cassius,0,1999-01-26,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,ENG
19552,Let Us Replay!,Coldcut,0,1999-01-26,Electronic,The marketing guys of yer average modern megac...,8.9,ENG
19553,"Singles Breaking Up, Vol. 1",Don Caballero,0,1999-01-12,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG
19554,Out of Tune,Mojave 3,0,1999-01-12,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG


### 4. Filtering

_key methods: .isn(), .between(), .where(), .query()_


In [22]:
# Boolean filtering
p4k[p4k["artist"]=="Prince"] 
p4k[p4k["genre"] == "Global"] 
#p4k[p4k["genre"] != "Metal"] # Filter negated with '!='
p4k[p4k["score"]>9.5]

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1599,Vox,"One Nite Alone, The Aftershow: It Ain't Over!",Prince,1,September 1 2016,Pop/R&B,Best new reissue Originally released as part o...,8.6,ENG
2042,Vox,"Sign ""O"" the Times",Prince,0,April 30 2016,Pop/R&B,Choosing a single high point from Prince's glo...,10.0,ENG
2043,Vox,1999,Prince,0,April 30 2016,Pop/R&B,1999 is the greatest album ever made about par...,10.0,ENG
2047,Vox,Dirty Mind,Prince,0,April 29 2016,Pop/R&B,Princes first fully actualized album is an un...,10.0,ENG
2048,Vox,Controversy,Prince,0,April 29 2016,Pop/R&B,Controversy emerged in 1981 at a pivotal time ...,9.0,ENG
2444,Vox,HITNRUN Phase Two,Prince,0,January 8 2016,Pop/R&B,The second of Prince's HITNRUN series is anoth...,4.7,ENG
2779,Vox,HITNRUN Phase One,Prince,0,September 10 2015,Pop/R&B,"Prince's new effort, exclusive to Jay Z's Tida...",4.5,ENG
12334,Vox,Planet Earth,Prince,0,July 23 2007,Pop/R&B,So far this year Prince has wowed at the Super...,4.8,ENG
13400,Vox,Ultimate Prince,Prince,0,September 5 2006,Pop/R&B,This Prince best of covers the Warner Brothers...,8.6,ENG
13946,Vox,3121,Prince,0,March 20 2006,Pop/R&B,"On his latest release, the rock legend betters...",6.0,ENG


In [25]:
# Multiple conditions
condition1 = p4k["score"]>9.3
condition2 = p4k["genre"] == "Global"
p4k[condition1 & condition2] # filtering on both

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1568,Vox,Caetano Veloso,Caetano Veloso,0,September 11 2016,Global,"In 1968, the Brazilian pop singer began a Trop...",9.4,ENG


In [26]:
p4k[condition1 | condition2] # filtering on either conditions

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
14,Vox,Master of Puppets,Metallica,1,December 2 2017,Metal,"Best new reissue In 1986, Metallica released i...",10.0,ENG
119,Vox,Walk Among Us,Misfits,0,October 31 2017,Metal,"They were outliers when they started, but by t...",9.4,ENG
128,Vox,I Can Hear the Heart Beating as One,Yo La Tengo,0,October 29 2017,Rock,"Twenty years on from its original release, Yo ...",9.7,ENG
154,Vox,The Queen Is Dead,The Smiths,0,October 22 2017,Rock,"Newly reissued as a boxed set, the Smiths 198...",10.0,ENG
218,Vox,Ash,Ibeyi,1,October 4 2017,Global,"Best new music On their second album, the Fren...",8.3,ENG
...,...,...,...,...,...,...,...,...,...
19473,Vox,Agaetis Byrjun,Sigur Rós,0,June 1 1999,Rock,Icelandic lore tells of the Hidden People who ...,9.4,ENG
19475,Vox,Livro,Caetano Veloso,0,June 1 1999,Global,I heard somewhere that a person's tastes chang...,9.0,ENG
19490,Vox,Mule Variations,Tom Waits,0,April 27 1999,Rock,I once took a poetry workshop taught by a guy ...,9.5,ENG
19520,Vox,Brand New Secondhand,Roots Manuva,0,March 23 1999,Electronic,"For politcially unaware, socially unconscious,...",9.5,ENG


The **.isin()** method

In [27]:
p4k[p4k["genre"].isin(["Global","Jazz"])] 

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
28,Vox,4444,Sam Gendel,0,November 29 2017,Jazz,Sam Gendels smoothly psychedelic debut is dom...,7.0,ENG
84,Vox,Tauhid/Jewels of Thought/Deaf Dumb Blind (Summ...,Pharoah Sanders,1,November 10 2017,Jazz,Best new reissue Best new reissue 1 / 3 Albums...,8.2,ENG
157,Vox,The Centennial Trilogy,Christian Scott aTunde Adjuah,0,October 21 2017,Jazz,1 / 3 Albums On the three albums that compose ...,7.6,ENG
178,Vox,The Magic City / My Brother the Wind Vol. 1,Sun Ra and His Arkestra,0,October 16 2017,Jazz,1 / 2 Albums Sun Ra manifested an ecstatic vis...,8.5,ENG
208,Vox,Dreams and Daggers,Cécile McLorin Salvant,0,October 7 2017,Jazz,The young jazz singers live double album show...,7.6,ENG
...,...,...,...,...,...,...,...,...,...
19261,Vox,Permutation,Bill Laswell,0,March 31 2000,Global,"If there's one thing Bill Laswell knows, it's ...",6.0,ENG
19276,Vox,Expensive Shit/He Miss Road,Fela Kuti,0,March 21 2000,Global,Afro-beat pioneer Fela Kuti was never more pis...,8.5,ENG
19280,Vox,Treader,Spring Heel Jack,0,March 21 2000,Jazz,Isn't it kind of alarming that the once cuttin...,5.4,ENG
19467,Vox,Interstellar Space Revisited: The Music of Joh...,Gregg Bendian / Nels Cline,0,June 8 1999,Jazz,A friend of mine once remarked about the later...,7.9,ENG


The **.between()** method

In [28]:
p4k[p4k["score"].between(3.4,3.6)] 

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
2,Vox,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
1200,Vox,December 99th,Yasiin Bey,0,January 2 2017,Rap,"On December 99th, Yasiin Bey (fka Mos Def) sou...",3.5,ENG
1348,Vox,Collage,The Chainsmokers,0,November 9 2016,Electronic,The massive lite-EDM duo the Chainsmokers port...,3.5,ENG
1775,Vox,New Introductory Lectures on the System of Tra...,Kel Valhaal,0,July 15 2016,Experimental,Liturgy frontman Hunter Hunt-Hendrix's latest ...,3.5,ENG
1808,Vox,Primary Colours,Magic!,0,July 6 2016,Pop/R&B,"After scoring a #1 hit with 2014's ""Rude,"" th...",3.5,ENG
...,...,...,...,...,...,...,...,...,...
18797,Vox,Bleed American,Jimmy Eat World,0,August 21 2001,Rock,Are you a 15-year-old TRL addict looking for a...,3.5,ENG
19072,Vox,Dream Signals in Full Circles,Tristeza,0,September 26 2000,Rock,"In the beginning, there was Nothing. And it wa...",3.5,ENG
19336,Vox,The Past Was Faster,Kelley Stoltz,0,December 14 1999,Rock,There's an upside and a downside to the perpet...,3.5,ENG
19394,Vox,Cobra and Phases Group Play Voltage in the Mil...,Stereolab,0,September 21 1999,Experimental,"""Okay, Brent, this is getting really old."" ""Wh...",3.4,ENG


The **.where()** method which returns the original dataframe with NAs in rows that don't meet the criteria

In [29]:
p4k.where(p4k["genre"]=="Rap")

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1,,,,,,,,,
2,Vox,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,,,,,,,,,
4,,,,,,,,,
5,,,,,,,,,
...,...,...,...,...,...,...,...,...,...
19551,,,,,,,,,
19552,,,,,,,,,
19553,,,,,,,,,
19554,,,,,,,,,


The **.query()** method is very readable, but since the conditions are wrapped in single quotes, it will not work if there are spaces in the column names

In [30]:
p4k.query('best==1.0')

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1,Vox,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
6,Vox,Perfect Angel,Minnie Riperton,1,December 5 2017,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0,ENG
14,Vox,Master of Puppets,Metallica,1,December 2 2017,Metal,"Best new reissue In 1986, Metallica released i...",10.0,ENG
19,Vox,Kick,INXS,1,December 1 2017,Rock,Best new reissue The 30th-anniversary edition ...,8.4,ENG
34,Vox,Utopia,Björk,1,November 27 2017,Electronic,"Best new music Filled with flute and birdsong,...",8.4,ENG
...,...,...,...,...,...,...,...,...,...
17501,Vox,Do You Party?,The Soft Pink Truth,1,February 4 2003,Electronic,Best new music There will be big fun in town t...,8.4,ENG
17511,Vox,You Forgot It in People,Broken Social Scene,1,February 2 2003,Rock,Best new music It's a bit late to be talking a...,9.2,ENG
17551,Vox,Everything Is Good Here/Please Come Home,The Angels of Light,1,January 20 2003,,"Best new music To a certain extent, most of us...",8.6,ENG
17554,Vox,Mount Eerie,The Microphones,1,January 20 2003,Experimental,Best new music Growing up in the shadow of Mt....,8.9,ENG


### 5. Ranking & Sorting

_key methods: .sort_values(), .sort_index(), .rank(), .nsmallest(), .nlargest()_

Sorting of values is done with the **.sort_values()** method or **.sort_index()**

In [31]:
p4k.sort_values("score",
               ascending=False, # default is ascending=True
               na_position="first", # position of na values. options include "first","last"
               inplace=True) # argument to replace 'p4k' dataframe with sorted output

In [32]:
p4k.sort_values(['score','artist'],ascending=[False,False]) # multiple parameters to sort by

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1162,Vox,Germfree Adolescents,X-Ray Spex,0,January 15 2017,Rock,"X-Ray Spexs debut album is a brash, vivid mas...",10.0,ENG
13782,Vox,Pink Flag / Chairs Missing / 154,Wire,0,May 5 2006,Rock,1 / 3 Albums Wire were born at the dawn of pun...,10.0,ENG
6049,Vox,The Disintegration Loops,William Basinski,1,November 19 2012,Experimental,Best new reissue The four volumes of William B...,10.0,ENG
18222,Vox,Yankee Hotel Foxtrot,Wilco,0,April 21 2002,Rock,"Myth, it has been said, is the buried part of ...",10.0,ENG
1004,Vox,Weezer (Blue Album),Weezer,0,February 26 2017,Rock,"Weezers 1994 debut, filled with geeky humor, ...",10.0,ENG
...,...,...,...,...,...,...,...,...,...
15690,Vox,Travistan,Travis Morrison,0,September 27 2004,Pop/R&B,After a prestigious and fruitful career fronti...,0.0,ENG
19228,Vox,NYC Ghosts & Flowers,Sonic Youth,0,April 30 2000,Rock,"No, I have not forgotten to put the numbers in...",0.0,ENG
15048,Vox,Relaxation of the Asshole,Robert Pollard,0,April 20 2005,Rock,If more drunks would learn from Robert Pollard...,0.0,ENG
17050,Vox,Liz Phair,Liz Phair,0,June 24 2003,Rock,It could be said that Liz Phair's greatest ass...,0.0,ENG


In [33]:
p4k.sort_index(inplace=True) # reverting to original dataframe by sorting by index 
p4k

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language
1,Vox,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG
2,Vox,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG
3,Vox,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG
4,Vox,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG
5,Vox,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG
...,...,...,...,...,...,...,...,...,...
19551,Vox,1999,Cassius,0,January 26 1999,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,ENG
19552,Vox,Let Us Replay!,Coldcut,0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG
19553,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG
19554,Vox,Out of Tune,Mojave 3,0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG


Ranking is done with **.rank()** method, while individual extreme records can be returned with **.nlargest()** and **.nsmallest()** 

In [34]:
p4k['score'].rank()

1         7976.0
2          402.0
3         5494.0
4        13486.5
5         5932.0
          ...   
19551     1133.0
19552    18931.0
19553     9370.0
19554     4286.0
19555     5069.5
Name: score, Length: 19555, dtype: float64

By default, **.rank()** assign average rankings to values that are equivalent. Different ranking methods can be applied by using the method parameter:

In [35]:
p4k['score'].rank(method='min',ascending=False) # Min assigns lowest ranking for observations with the same values 

1        11104.0
2        19131.0
3        13879.0
4         5691.0
5        13370.0
          ...   
19551    18368.0
19552      587.0
19553     9807.0
19554    15110.0
19555    14246.0
Name: score, Length: 19555, dtype: float64

In [36]:
# Adding score_rank to the dataframe as a new column
p4k['score_rank']=p4k['score'].rank(method='min',ascending=False) 
p4k.sort_values("score_rank", ascending=True, na_position="first")  

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
9787,Vox,Abbey Road,The Beatles,0,September 10 2009,Rock,"The perfect ending to a recording career, this...",10.0,ENG,1.0
2280,Vox,Off the Wall,Michael Jackson,1,February 24 2016,Pop/R&B,Best new reissue Off the Wall is the sound of ...,10.0,ENG,1.0
7272,Vox,The Smile Sessions,The Beach Boys,1,November 2 2011,Rock,"Best new reissue Conceived, recorded, and ulti...",10.0,ENG,1.0
9784,Vox,The Stone Roses,The Stone Roses,1,September 11 2009,Rock,Best new reissue A badly needed remaster of th...,10.0,ENG,1.0
9261,Vox,Ladies and Gentlemen We are Floating in Space ...,Spiritualized,1,March 2 2010,Experimental,"Best new reissue This new deluxe, limited prod...",10.0,ENG,1.0
...,...,...,...,...,...,...,...,...,...,...
13301,Vox,Shine On,Jet,0,October 2 2006,Rock,,0.0,ENG,19550.0
19228,Vox,NYC Ghosts & Flowers,Sonic Youth,0,April 30 2000,Rock,"No, I have not forgotten to put the numbers in...",0.0,ENG,19550.0
12223,Vox,This Is Next,Various Artists,0,August 22 2007,,Basically an ADA sampler with a promotional st...,0.0,ENG,19550.0
17050,Vox,Liz Phair,Liz Phair,0,June 24 2003,Rock,It could be said that Liz Phair's greatest ass...,0.0,ENG,19550.0


In [37]:
p4k.nsmallest(n=4, columns="score")
# Alternative method which returns the records' index labels
#p4k["score"].nsmallest(n=4)

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
12223,Vox,This Is Next,Various Artists,0,August 22 2007,,Basically an ADA sampler with a promotional st...,0.0,ENG,19550.0
13301,Vox,Shine On,Jet,0,October 2 2006,Rock,,0.0,ENG,19550.0
15048,Vox,Relaxation of the Asshole,Robert Pollard,0,April 20 2005,Rock,If more drunks would learn from Robert Pollard...,0.0,ENG,19550.0
15690,Vox,Travistan,Travis Morrison,0,September 27 2004,Pop/R&B,After a prestigious and fruitful career fronti...,0.0,ENG,19550.0


In [38]:
#p4k.nlargest(n=4, columns="score")
# Alternative method which returns the records' index labels
p4k["score"].nlargest(n=4)

14     10.0
154    10.0
506    10.0
566    10.0
Name: score, dtype: float64

### 6. Working with NAs

_key methods: .isnull(), .notnull(), .dropna(), .fillna()_

There are a variety of methods to identify, remove, and replace NA values

In [39]:
# adding a NA row to the data.frame
import numpy as np
NA_Row = {'album':np.nan,'artist':np.nan,'best':np.nan,'genre':np.nan,'review':np.nan,'score':np.nan}
NA_Row

{'album': nan,
 'artist': nan,
 'best': nan,
 'genre': nan,
 'review': nan,
 'score': nan}

In [40]:
p4k = p4k.append(NA_Row,ignore_index=True)
p4k

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
0,Vox,A.M./Being There,Wilco,1.0,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG,11104.0
1,Vox,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG,19131.0
2,Vox,Material Control,Glassjaw,0.0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG,13879.0
3,Vox,Weighing of the Heart,Nabihah Iqbal,0.0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG,5691.0
4,Vox,The Visitor,Neil Young / Promise of the Real,0.0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG,13370.0
...,...,...,...,...,...,...,...,...,...,...
19551,Vox,Let Us Replay!,Coldcut,0.0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG,587.0
19552,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0.0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG,9807.0
19553,Vox,Out of Tune,Mojave 3,0.0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG,15110.0
19554,Vox,Left for Dead in Malaysia,Neil Hamburger,0.0,January 5 1999,,Neil Hamburger's third comedy release is a des...,6.5,ENG,14246.0


Null values can be identifed using **.isnull()**, while **.notnull()** can be used for the opposite. Both return boolean values.

In [43]:
p4k.isnull()

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
0,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...
19551,False,False,False,False,False,False,False,False,False,False
19552,False,False,False,False,False,False,False,False,False,False
19553,False,False,False,False,False,False,False,False,False,False
19554,False,False,False,False,False,False,False,False,False,False


In [44]:
p4k.notnull()

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
0,True,True,True,True,True,True,True,True,True,True
1,True,True,True,True,True,True,True,True,True,True
2,True,True,True,True,True,True,True,True,True,True
3,True,True,True,True,True,True,True,True,True,True
4,True,True,True,True,True,True,True,True,True,True
...,...,...,...,...,...,...,...,...,...,...
19551,True,True,True,True,True,True,True,True,True,True
19552,True,True,True,True,True,True,True,True,True,True
19553,True,True,True,True,True,True,True,True,True,True
19554,True,True,True,True,True,True,True,True,True,True


The **.dropna()** method allows for multiple ways to drop NaN values:

In [41]:
# drops any value with a NA
p4k.dropna() 

#only removes rows with NAs in all values
p4k.dropna(how="all", inplace=True) 

# Remove columns with any na values -- in this case, all columns
p4k.dropna(axis=1)

# Only dropping observations where there are nulls in "Salary" column
p4k.dropna(subset=["album"]) 


Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
0,Vox,A.M./Being There,Wilco,1.0,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG,11104.0
1,Vox,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG,19131.0
2,Vox,Material Control,Glassjaw,0.0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG,13879.0
3,Vox,Weighing of the Heart,Nabihah Iqbal,0.0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG,5691.0
4,Vox,The Visitor,Neil Young / Promise of the Real,0.0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG,13370.0
...,...,...,...,...,...,...,...,...,...,...
19550,Vox,1999,Cassius,0.0,January 26 1999,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,ENG,18368.0
19551,Vox,Let Us Replay!,Coldcut,0.0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG,587.0
19552,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0.0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG,9807.0
19553,Vox,Out of Tune,Mojave 3,0.0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG,15110.0


The **.fillna()** method allows for automaic filling of NA values

In [42]:
 # will replace all NAs within the datafram with 0 (not ideal)
p4k.fillna(value=0)


Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
0,Vox,A.M./Being There,Wilco,1.0,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG,11104.0
1,Vox,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG,19131.0
2,Vox,Material Control,Glassjaw,0.0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG,13879.0
3,Vox,Weighing of the Heart,Nabihah Iqbal,0.0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG,5691.0
4,Vox,The Visitor,Neil Young / Promise of the Real,0.0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG,13370.0
...,...,...,...,...,...,...,...,...,...,...
19551,Vox,Let Us Replay!,Coldcut,0.0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG,587.0
19552,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0.0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG,9807.0
19553,Vox,Out of Tune,Mojave 3,0.0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG,15110.0
19554,Vox,Left for Dead in Malaysia,Neil Hamburger,0.0,January 5 1999,,Neil Hamburger's third comedy release is a des...,6.5,ENG,14246.0


### 7. Unique and Duplicate values

_key methods: .unique(), .nunique(), .duplicated(), .drop_duplicates()_

Two methods for identifying unique values are **.unique()** which returns an array of unique values, while **.nunique()** counts those unique values

In [48]:
p4k['genre'].unique()

array(['Rock', 'Rap', 'Pop/R&B', 'Metal', 'None', 'Electronic',
       'Experimental', 'Folk/Country', 'Jazz', 'Global', nan],
      dtype=object)

In [50]:
# dropna parameter will not include nan's in the count
p4k['genre'].nunique(dropna=True) 

10

**.duplicated()** and **.drop_duplicates** are methods for identifying and removing duplicate rows in a dataframe.

In [None]:
.duplicated()
df["First Name"].duplicated(keep="first") # by default, only the first observation of a value is not seen as a duplicate
df["First Name"].duplicated(keep="last") # running from the bottom up
df["First Name"].duplicated(keep= False) # identifys all values that are duplciated
df[df["First Name"].duplicated(keep= False)] # keeping any values that are duplicated
df[~df["First Name"].duplicated(keep=False)] # '~' negates to get all values which have no duplicates



In [57]:
# returns boolean 
p4k['genre'].duplicated() 

# by default, only the first observation of a value is not seen as a duplicate
p4k['genre'].duplicated(keep="first")

# running from the bottom up
p4k['genre'].duplicated(keep="last") 


0         True
1         True
2         True
3         True
4         True
         ...  
19551    False
19552    False
19553    False
19554    False
19555    False
Name: genre, Length: 19556, dtype: bool

In [58]:
# identifys all values that are duplciated
p4k['genre'].duplicated(keep= False) 

# keeping any values that are duplicated (all rows except NA row)
p4k[p4k['genre'].duplicated(keep= False)] 

# '~' negates to get all values which have no duplicates (only NA row)
p4k[~p4k['genre'].duplicated(keep=False)] 


Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
19555,,,,,,,,,,


In [60]:

# only dropping duplicates in the score column, keeping first obs
p4k.drop_duplicates(subset=["score"], keep = "first")

# dropping all duplicated artists, only returning artists w/one review
p4k.drop_duplicates(subset=["artist"], keep = False)

# Can drop duplicates across multiple columns
p4k.drop_duplicates(subset = ["genre", "date"]) 


Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
0,Vox,A.M./Being There,Wilco,1.0,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG,11104.0
1,Vox,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG,19131.0
3,Vox,Weighing of the Heart,Nabihah Iqbal,0.0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG,5691.0
4,Vox,The Visitor,Neil Young / Promise of the Real,0.0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG,13370.0
5,Vox,Perfect Angel,Minnie Riperton,1.0,December 5 2017,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0,ENG,425.0
...,...,...,...,...,...,...,...,...,...,...
19550,Vox,1999,Cassius,0.0,January 26 1999,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,ENG,18368.0
19552,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0.0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG,9807.0
19553,Vox,Out of Tune,Mojave 3,0.0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG,15110.0
19554,Vox,Left for Dead in Malaysia,Neil Hamburger,0.0,January 5 1999,,Neil Hamburger's third comedy release is a des...,6.5,ENG,14246.0


### 8. Working with Indexes

_key methods: .sort_index(), .set_index(), .reset_index(), .get_level_values(), .swap_levels(), .stack(), .unstack()_

Indexes in pandas store axis labels for all pandas objects. Indexes can take the form of id values, categorical indexes, multiple indexes, datetime indexes among others.

In [100]:
p4k.index

Int64Index([    1,     2,     3,     4,     5,     6,     7,     8,     9,
               10,
            ...
            19546, 19547, 19548, 19549, 19550, 19551, 19552, 19553, 19554,
            19555],
           dtype='int64', length=19555)

In [101]:
# returning an array of representing the data in the index 
p4k.values 

array([['A.M./Being There', 'Wilco', 1, ..., 'Rock',
        'Best new reissue 1 / 2 Albums Newly reissued and remastered, the group\x92s first two albums find Jeff Tweedy and his Chicago band transforming themselves from alt-country also-rans into a formidable rock\x91n\x92roll outfit. The nuclear detonation of Uncle Tupelo launched an alt-country arms race, with the band\x92s two chief singer-songwriters mutating from old friends into bitter enemies trying to outdo each other with their follow-up records. Jay Farrar started Son Volt with Tupelo\x92s drummer, Mike Heidorn, and released Trace, which yielded the radio hit \x93Drown\x94 and found him greeted as a visionary. Jeff Tweedy, on the other hand, rushed into the studio to record a set of demos with his new band, Wilco, barely a couple months after his old band had played its final show. Nearly a year later they released their first album, A.M., which was greeted with a big shrug from critics and fans alike. Tweedy had managed to

In [102]:
# sorting on current index
p4k.sort_index() 

Unnamed: 0,album,artist,best,date,genre,review,score
1,A.M./Being There,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2,No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5
3,Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6
4,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
5,The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7
...,...,...,...,...,...,...,...
19551,1999,Cassius,0,1999-01-26,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8
19552,Let Us Replay!,Coldcut,0,1999-01-26,Electronic,The marketing guys of yer average modern megac...,8.9
19553,"Singles Breaking Up, Vol. 1",Don Caballero,0,1999-01-12,Experimental,"Well, kids, I just went back and re-read my re...",7.2
19554,Out of Tune,Mojave 3,0,1999-01-12,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3


In [103]:
p4k.set_index('album') # set the data frame index using existing columns
# Note: index can be reset to default using reset_index()

Unnamed: 0_level_0,artist,best,date,genre,review,score
album,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A.M./Being There,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5
Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6
Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7
...,...,...,...,...,...,...
1999,Cassius,0,1999-01-26,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8
Let Us Replay!,Coldcut,0,1999-01-26,Electronic,The marketing guys of yer average modern megac...,8.9
"Singles Breaking Up, Vol. 1",Don Caballero,0,1999-01-12,Experimental,"Well, kids, I just went back and re-read my re...",7.2
Out of Tune,Mojave 3,0,1999-01-12,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3


In [104]:
# multi-indexes can be set with set_index()
multindexP4k = p4k.set_index(['genre','date'])
multindexP4k.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,album,artist,best,review,score
genre,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Rock,2017-12-06,A.M./Being There,Wilco,1,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
Rap,2017-12-06,No Shame,Hopsin,0,"On his corrosive fifth album, the rapper takes...",3.5
Rock,2017-12-06,Material Control,Glassjaw,0,"On their first album in 15 years, the Long Isl...",6.6
Pop/R&B,2017-12-06,Weighing of the Heart,Nabihah Iqbal,0,"On her debut LP, British producer Nabihah Iqba...",7.7
Rock,2017-12-05,The Visitor,Neil Young / Promise of the Real,0,"While still pointedly political, Neil Youngs ...",6.7


In [105]:
multindexP4k.index
# multindexP4k.index[0] # selecting first row from multi-index

MultiIndex([(        'Rock', '2017-12-06'),
            (         'Rap', '2017-12-06'),
            (        'Rock', '2017-12-06'),
            (     'Pop/R&B', '2017-12-06'),
            (        'Rock', '2017-12-05'),
            (     'Pop/R&B', '2017-12-05'),
            (     'Pop/R&B', '2017-12-05'),
            (         'Rap', '2017-12-05'),
            (        'Rock', '2017-12-04'),
            (       'Metal', '2017-12-04'),
            ...
            (  'Electronic', '1999-02-11'),
            (  'Electronic', '1999-02-09'),
            (  'Electronic', '1999-02-09'),
            ('Folk/Country', '1999-02-09'),
            (  'Electronic', '1999-02-01'),
            (  'Electronic', '1999-01-26'),
            (  'Electronic', '1999-01-26'),
            ('Experimental', '1999-01-12'),
            (        'Rock', '1999-01-12'),
            (        'None', '1999-01-05')],
           names=['genre', 'date'], length=19555)

In [106]:
multindexP4k.index.get_level_values(0)

Index(['Rock', 'Rap', 'Rock', 'Pop/R&B', 'Rock', 'Pop/R&B', 'Pop/R&B', 'Rap',
       'Rock', 'Metal',
       ...
       'Electronic', 'Electronic', 'Electronic', 'Folk/Country', 'Electronic',
       'Electronic', 'Electronic', 'Experimental', 'Rock', 'None'],
      dtype='object', name='genre', length=19555)

In [107]:
multindexP4k.index.get_level_values(1)

DatetimeIndex(['2017-12-06', '2017-12-06', '2017-12-06', '2017-12-06',
               '2017-12-05', '2017-12-05', '2017-12-05', '2017-12-05',
               '2017-12-04', '2017-12-04',
               ...
               '1999-02-11', '1999-02-09', '1999-02-09', '1999-02-09',
               '1999-02-01', '1999-01-26', '1999-01-26', '1999-01-12',
               '1999-01-12', '1999-01-05'],
              dtype='datetime64[ns]', name='date', length=19555, freq=None)

In [108]:
multindexP4k.swaplevel() # Swapping levels

Unnamed: 0_level_0,Unnamed: 1_level_0,album,artist,best,review,score
date,genre,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-12-06,Rock,A.M./Being There,Wilco,1,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2017-12-06,Rap,No Shame,Hopsin,0,"On his corrosive fifth album, the rapper takes...",3.5
2017-12-06,Rock,Material Control,Glassjaw,0,"On their first album in 15 years, the Long Isl...",6.6
2017-12-06,Pop/R&B,Weighing of the Heart,Nabihah Iqbal,0,"On her debut LP, British producer Nabihah Iqba...",7.7
2017-12-05,Rock,The Visitor,Neil Young / Promise of the Real,0,"While still pointedly political, Neil Youngs ...",6.7
...,...,...,...,...,...,...
1999-01-26,Electronic,1999,Cassius,0,"Well, it's been two weeks now, and I guess it'...",4.8
1999-01-26,Electronic,Let Us Replay!,Coldcut,0,The marketing guys of yer average modern megac...,8.9
1999-01-12,Experimental,"Singles Breaking Up, Vol. 1",Don Caballero,0,"Well, kids, I just went back and re-read my re...",7.2
1999-01-12,Rock,Out of Tune,Mojave 3,0,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3


In [109]:
## Extract Rows from a MultiIndex Dataframe
# multindexP4k.loc[("2017-12-05","Rock")]

In [110]:
multindexP4k

Unnamed: 0_level_0,Unnamed: 1_level_0,album,artist,best,review,score
genre,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Rock,2017-12-06,A.M./Being There,Wilco,1,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
Rap,2017-12-06,No Shame,Hopsin,0,"On his corrosive fifth album, the rapper takes...",3.5
Rock,2017-12-06,Material Control,Glassjaw,0,"On their first album in 15 years, the Long Isl...",6.6
Pop/R&B,2017-12-06,Weighing of the Heart,Nabihah Iqbal,0,"On her debut LP, British producer Nabihah Iqba...",7.7
Rock,2017-12-05,The Visitor,Neil Young / Promise of the Real,0,"While still pointedly political, Neil Youngs ...",6.7
...,...,...,...,...,...,...
Electronic,1999-01-26,1999,Cassius,0,"Well, it's been two weeks now, and I guess it'...",4.8
Electronic,1999-01-26,Let Us Replay!,Coldcut,0,The marketing guys of yer average modern megac...,8.9
Experimental,1999-01-12,"Singles Breaking Up, Vol. 1",Don Caballero,0,"Well, kids, I just went back and re-read my re...",7.2
Rock,1999-01-12,Out of Tune,Mojave 3,0,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3


When working with multiple indexes, the **.stack()** and **unstack()** methods can help gather or spread the data

In [111]:
multindexP4k.stack() # gathers all the column data into a new index with one columns holding all the values
multindexP4k.stack().to_frame() # Converting to a dataframe

genre  date              
Rock   2017-12-06  album                                      A.M./Being There
                   artist                                                Wilco
                   best                                                      1
                   review    Best new reissue 1 / 2 Albums Newly reissued a...
                   score                                                     7
                                                   ...                        
None   1999-01-05  album                             Left for Dead in Malaysia
                   artist                                       Neil Hamburger
                   best                                                      0
                   review    Neil Hamburger's third comedy release is a des...
                   score                                                   6.5
Length: 97769, dtype: object

In [114]:
multindexP4k.stack().to_frame().index

MultiIndex([('Rock', '2017-12-06',  'album'),
            ('Rock', '2017-12-06', 'artist'),
            ('Rock', '2017-12-06',   'best'),
            ('Rock', '2017-12-06', 'review'),
            ('Rock', '2017-12-06',  'score'),
            ( 'Rap', '2017-12-06',  'album'),
            ( 'Rap', '2017-12-06', 'artist'),
            ( 'Rap', '2017-12-06',   'best'),
            ( 'Rap', '2017-12-06', 'review'),
            ( 'Rap', '2017-12-06',  'score'),
            ...
            ('Rock', '1999-01-12',  'album'),
            ('Rock', '1999-01-12', 'artist'),
            ('Rock', '1999-01-12',   'best'),
            ('Rock', '1999-01-12', 'review'),
            ('Rock', '1999-01-12',  'score'),
            ('None', '1999-01-05',  'album'),
            ('None', '1999-01-05', 'artist'),
            ('None', '1999-01-05',   'best'),
            ('None', '1999-01-05', 'review'),
            ('None', '1999-01-05',  'score')],
           names=['genre', 'date', None], length=97769)

In [119]:
# Unstack does the opposite, spreads the stacked data into multiple columns
s = multindexP4k.stack()
# s.unstack() by default works with the deepest layer. Throws an error on this data as there are duplicates entries

ValueError: Index contains duplicate entries, cannot reshape

### 9. Renaming Labels and Columns

_key methods: .rename()_


In [67]:
# Renaming columns with a dictionary. This method can rename all or some columns
p4k.columns
p4k.rename(columns = {"Parent Company" : "parent_company",
                      "album" : "album_title",
                      "artist" : "artist_name",
                      "best" : "best_reviewed",
                      "date" : "review_date",
                      "genre" : "genre_type",
                      "review" : "review_text",
                      "score" : "review_score",
                      "Review Language" : "review_lang",
                      "score_rank" : "ranked_score"}
                       ,inplace=False)


Unnamed: 0,parent_company,album_title,artist_name,best_reviewed,review_date,genre_type,review_text,review_score,review_lang,ranked_score
0,Vox,A.M./Being There,Wilco,1.0,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG,11104.0
1,Vox,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG,19131.0
2,Vox,Material Control,Glassjaw,0.0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG,13879.0
3,Vox,Weighing of the Heart,Nabihah Iqbal,0.0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG,5691.0
4,Vox,The Visitor,Neil Young / Promise of the Real,0.0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG,13370.0
...,...,...,...,...,...,...,...,...,...,...
19551,Vox,Let Us Replay!,Coldcut,0.0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG,587.0
19552,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0.0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG,9807.0
19553,Vox,Out of Tune,Mojave 3,0.0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG,15110.0
19554,Vox,Left for Dead in Malaysia,Neil Hamburger,0.0,January 5 1999,,Neil Hamburger's third comedy release is a des...,6.5,ENG,14246.0


In [None]:
# This method requires renaming ALL columns
# p4k.columns = ["c1","c2","c3","c4","c5","c6","c7","c8","c9","c10"]

In [72]:
p4k.rename(index = {'1' : 'One',
                    '2' : 'Two'})

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
0,Vox,A.M./Being There,Wilco,1.0,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,ENG,11104.0
1,Vox,No Shame,Hopsin,0.0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5,ENG,19131.0
2,Vox,Material Control,Glassjaw,0.0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6,ENG,13879.0
3,Vox,Weighing of the Heart,Nabihah Iqbal,0.0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,ENG,5691.0
4,Vox,The Visitor,Neil Young / Promise of the Real,0.0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7,ENG,13370.0
...,...,...,...,...,...,...,...,...,...,...
19551,Vox,Let Us Replay!,Coldcut,0.0,January 26 1999,Electronic,The marketing guys of yer average modern megac...,8.9,ENG,587.0
19552,Vox,"Singles Breaking Up, Vol. 1",Don Caballero,0.0,January 12 1999,Experimental,"Well, kids, I just went back and re-read my re...",7.2,ENG,9807.0
19553,Vox,Out of Tune,Mojave 3,0.0,January 12 1999,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,ENG,15110.0
19554,Vox,Left for Dead in Malaysia,Neil Hamburger,0.0,January 5 1999,,Neil Hamburger's third comedy release is a des...,6.5,ENG,14246.0


### 10. Sampling

_key methods: .sample()_

A random sample of either an integer count or a fraction of the total can be done through **.sample()**

In [63]:
p4k.sample(frac=0.01)
p4k.sample(n=3)

Unnamed: 0,Parent Company,album,artist,best,date,genre,review,score,Review Language,score_rank
18004,Vox,Dirty Vegas,Dirty Vegas,0.0,July 8 2002,Electronic,"Used to be, way back in the twentieth century,...",4.4,ENG,18683.0
9255,Vox,The Creatures in the Garden of Lady Walton / V...,Clogs,0.0,March 3 2010,Rock,1 / 2 Albums The National's Bryce Dessner's po...,8.2,ENG,2308.0
7502,Vox,I Am an Attic,Nick Diamonds,0.0,August 26 2011,Rock,The busy collaborator and frontman (Mister Hea...,6.4,ENG,14728.0


In [64]:
# By default rows are sampled, but columns can be sampled as well
p4k.sample(n=2,axis=1)
p4k.sample(n=2,axis="columns")

Unnamed: 0,review,score_rank
0,Best new reissue 1 / 2 Albums Newly reissued a...,11104.0
1,"On his corrosive fifth album, the rapper takes...",19131.0
2,"On their first album in 15 years, the Long Isl...",13879.0
3,"On her debut LP, British producer Nabihah Iqba...",5691.0
4,"While still pointedly political, Neil Youngs ...",13370.0
...,...,...
19551,The marketing guys of yer average modern megac...,587.0
19552,"Well, kids, I just went back and re-read my re...",9807.0
19553,"Out of Tune is a Steve Martin album. Yes, I'll...",15110.0
19554,Neil Hamburger's third comedy release is a des...,14246.0


### 11. Grouping and Agg

_key methods: .groupby(), .agg()_

In [110]:
p4k.groupby('genre').size()

genre
Electronic      4020
Experimental    1699
Folk/Country     700
Global           178
Jazz             257
Metal            781
None            2324
Pop/R&B         1157
Rap             1481
Rock            6958
dtype: int64

In [111]:
p4k.groupby(['genre','best']).size()

genre         best
Electronic    0       3827
              1        193
Experimental  0       1588
              1        111
Folk/Country  0        667
              1         33
Global        0        169
              1          9
Jazz          0        232
              1         25
Metal         0        759
              1         22
None          0       2269
              1         55
Pop/R&B       0       1070
              1         87
Rap           0       1399
              1         82
Rock          0       6535
              1        423
dtype: int64

In [114]:
genre_group = p4k.groupby('genre')

In [117]:
genre_group.sum() # by default, will only apply to numeric columns

Unnamed: 0_level_0,best,score
genre,Unnamed: 1_level_1,Unnamed: 2_level_1
Electronic,193,27904.1
Experimental,111,12503.0
Folk/Country,33,5053.9
Global,9,1323.4
Jazz,25,1945.1
Metal,22,5451.7
,55,16315.9
Pop/R&B,87,8039.2
Rap,82,10293.4
Rock,423,48592.0


In [124]:
genre_group["score"].std() # standard deviation of scores by group

genre
Electronic      1.286230
Experimental    1.084305
Folk/Country    1.039006
Global          1.020147
Jazz            1.160742
Metal           1.382885
None            1.232897
Pop/R&B         1.251231
Rap             1.259559
Rock            1.336472
Name: score, dtype: float64

In [125]:
genre_group["score"].describe() # Describe 

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Electronic,4020.0,6.941318,1.28623,0.2,6.4,7.2,7.8,10.0
Experimental,1699.0,7.359035,1.084305,0.3,6.9,7.5,8.0,10.0
Folk/Country,700.0,7.219857,1.039006,2.3,6.8,7.4,7.9,10.0
Global,178.0,7.434831,1.020147,2.2,7.1,7.7,8.0,9.4
Jazz,257.0,7.568482,1.160742,1.0,7.0,7.7,8.2,10.0
Metal,781.0,6.98041,1.382885,0.2,6.5,7.4,7.9,10.0
,2324.0,7.020611,1.232897,0.0,6.5,7.2,7.8,10.0
Pop/R&B,1157.0,6.948315,1.251231,0.0,6.3,7.1,7.7,10.0
Rap,1481.0,6.950304,1.259559,1.0,6.4,7.2,7.8,10.0
Rock,6958.0,6.983616,1.336472,0.0,6.4,7.2,7.8,10.0


In [126]:
best_genre = p4k.groupby(['genre','best'])

In [128]:
# returning most recent best=1 reviews by genre
best_genre['date'].max() 

genre         best
Electronic    0      2017-11-30
              1      2017-11-27
Experimental  0      2017-11-30
              1      2017-10-04
Folk/Country  0      2017-11-29
              1      2017-08-11
Global        0      2017-09-29
              1      2017-10-04
Jazz          0      2017-11-29
              1      2017-11-10
Metal         0      2017-12-04
              1      2017-12-02
None          0      2017-12-01
              1      2017-10-25
Pop/R&B       0      2017-12-06
              1      2017-12-05
Rap           0      2017-12-06
              1      2017-10-07
Rock          0      2017-12-06
              1      2017-12-06
Name: date, dtype: datetime64[ns]

Aggregate - aggregate data using one or more operations over the specified axis.



In [131]:
genre_group.agg({"best" : "sum",
             "score" : "mean",
             "date": "min"}) # can specify methods by columns using a python dictionary

Unnamed: 0_level_0,best,score,date
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Electronic,193,6.941318,1999-01-26
Experimental,111,7.359035,1999-01-12
Folk/Country,33,7.219857,1999-02-09
Global,9,7.434831,1999-06-01
Jazz,25,7.568482,1999-06-08
Metal,22,6.98041,1999-02-16
,55,7.020611,1999-01-05
Pop/R&B,87,6.948315,1999-02-23
Rap,82,6.950304,1999-10-12
Rock,423,6.983616,1999-01-12


In [132]:
# giving a list instead for multiple methods to each col
genre_group.agg(['size','sum','mean'])

Unnamed: 0_level_0,best,best,best,score,score,score
Unnamed: 0_level_1,size,sum,mean,size,sum,mean
genre,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Electronic,4020,193,0.04801,4020,27904.1,6.941318
Experimental,1699,111,0.065333,1699,12503.0,7.359035
Folk/Country,700,33,0.047143,700,5053.9,7.219857
Global,178,9,0.050562,178,1323.4,7.434831
Jazz,257,25,0.097276,257,1945.1,7.568482
Metal,781,22,0.028169,781,5451.7,6.98041
,2324,55,0.023666,2324,16315.9,7.020611
Pop/R&B,1157,87,0.075194,1157,8039.2,6.948315
Rap,1481,82,0.055368,1481,10293.4,6.950304
Rock,6958,423,0.060793,6958,48592.0,6.983616


### 12. Date and Times

_key attributes: .year, .month, .day, .dayofweek, .hour, .minute, .second_

_key methods: .Timestamp(), .to_datetime. .date_range(), .truncate(), .dateoffset(), .Timedelta()_

The following functions for working more closely with dates come both pandas and from the [datetime](https://docs.python.org/3/library/datetime.html) library.

In [133]:
import datetime as dt

Creating dates and datetimes

In [139]:
# Creating dates
someday = dt.date(2016,4,12)
someday

datetime.date(2016, 4, 12)

In [135]:
# extracting date values 
someday.year
someday.month
someday.day

12

In [140]:
# Creating datetimes
somedatetime = dt.datetime(2016,4,12,8,13,59)
somedatetime

datetime.datetime(2016, 4, 12, 8, 13, 59)

In [141]:
# extracting date and time values
somedatetime.year
somedatetime.month
somedatetime.day
someday.dayofweek
somedatetime.hour
somedatetime.minute
somedatetime.second

59

The pandas **.Timestamp()** and **.to_datetime().** method can work with a variety of input formats

In [144]:
pd.Timestamp('2015-3-31')
pd.Timestamp('2015/3/31')
pd.Timestamp('2015, 3, 31')

Timestamp('2015-03-31 00:00:00')

In [145]:
pd.Timestamp('1/1/2012 08:35:15')
pd.Timestamp('1/1/2012 08:35:15 PM')
pd.Timestamp(dt.datetime(2012,1,1,8,35,5))

Timestamp('2012-01-01 08:35:05')

In [None]:
pd.to_datetime("2001-04-19")
pd.to_datetime(dt.date(2015,1,1))
pd.to_datetime(dt.datetime(2015,2,2,14,35,20))
pd.to_datetime(["2015-01-03","2014/02/08","2016","July 4th, 1996"])

In [None]:
times = pd.Series(["2015-01-03","2014/02/08","2016","July 4th, 1996"])
times

In [None]:
pd.to_datetime(times)

pd.to_datetime can also work with bad data provided the errors are coerced:

In [None]:
bad_dates = pd.Series(["July 14th 1995","Hello","2015-2-31"])

In [None]:
pd.to_datetime(bad_dates)
# To handle this set errors="coerce" to fix those it cant. Otherwise will be 'NAT', not a time 

In [None]:
pd.to_datetime(bad_dates,errors="coerce")

The method **pd.date_range()** is used to create a range of dates of different frequnecies

In [None]:
times = pd.date_range(start="2016-01-01",end="2016-01-11",freq="D")

In [None]:
pd.date_range(start="2016-01-01",end="2016-01-11",freq="2D") # 2 day incriment
pd.date_range(start="2016-01-01",end="2016-01-11",freq="B") # B -Just weekdays
pd.date_range(start="2016-01-01",end="2016-01-16",freq="W") # Weeks (starting on sunday)
pd.date_range(start="2016-01-01",end="2016-01-16",freq="W-FRI") # W-Fri start on Friday
pd.date_range(start="2016-01-01",end="2016-01-16",freq="H") # H - Hour
pd.date_range(start="2016-01-01",end="2016-10-16",freq="M") # M - Month end

pd.date_range(start="2012-09-09",periods=25,freq="D")  #Periods = # of results
pd.date_range(start="2012-09-09",periods=4,freq="W")  #Periods = # of results

Date series or columns can be filtered using **.truncate()**

In [16]:
dp4k = p4k.set_index('date')
dp4k

Unnamed: 0_level_0,album,artist,best,genre,review,score
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-12-06,A.M./Being There,Wilco,1,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2017-12-06,No Shame,Hopsin,0,Rap,"On his corrosive fifth album, the rapper takes...",3.5
2017-12-06,Material Control,Glassjaw,0,Rock,"On their first album in 15 years, the Long Isl...",6.6
2017-12-06,Weighing of the Heart,Nabihah Iqbal,0,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
2017-12-05,The Visitor,Neil Young / Promise of the Real,0,Rock,"While still pointedly political, Neil Youngs ...",6.7
...,...,...,...,...,...,...
1999-01-26,1999,Cassius,0,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8
1999-01-26,Let Us Replay!,Coldcut,0,Electronic,The marketing guys of yer average modern megac...,8.9
1999-01-12,"Singles Breaking Up, Vol. 1",Don Caballero,0,Experimental,"Well, kids, I just went back and re-read my re...",7.2
1999-01-12,Out of Tune,Mojave 3,0,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3


In [25]:
dp4k.truncate(before="2010-02-05")
dp4k.truncate(after="2017-12-05")

Unnamed: 0_level_0,album,artist,best,genre,review,score
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-12-06,A.M./Being There,Wilco,1,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2017-12-06,No Shame,Hopsin,0,Rap,"On his corrosive fifth album, the rapper takes...",3.5
2017-12-06,Material Control,Glassjaw,0,Rock,"On their first album in 15 years, the Long Isl...",6.6
2017-12-06,Weighing of the Heart,Nabihah Iqbal,0,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
2017-12-05,The Visitor,Neil Young / Promise of the Real,0,Rock,"While still pointedly political, Neil Youngs ...",6.7
2017-12-05,Perfect Angel,Minnie Riperton,1,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0
2017-12-05,Everyday Is Christmas,Sia,0,Pop/R&B,Sias shiny Christmas album feels inconsistent...,5.8
2017-12-05,Zaytown Sorority Class of 2017,Zaytoven,0,Rap,The prolific Atlanta producer enlists 17 women...,6.2


Timedeltas can be calculated from two date or datetime objects using arithmetic

In [43]:
timeA = pd.Timestamp("2016-03-31 04:35:16 PM")
timeB = pd.Timestamp("2016-03-20")

timeA - timeB
timeB - timeA 

Timedelta('-12 days +07:24:44')

In [44]:
pd.Timedelta(days=3) # creating an 'empty' 3 day timedelta
pd.Timedelta(days=3,hours=2,minutes=23) # 3 days + 2 hours + 2 minutes
pd.Timedelta("14 days 6 hours 12 minutes 49 seconds") # can even be done with a string

Timedelta('3 days 00:00:00')

In [45]:
timeB + pd.Timedelta(days=3)

Timestamp('2016-03-23 00:00:00')

An extension of this is the **.dateoffset()** method

In [40]:

from pandas.tseries.offsets import * 

# adding two days
p4k.date + pd.DateOffset(days=5)
# subtracting 2 weeks
p4k.date + pd.DateOffset(weeks=-2)
# Adding a year
p4k.date + pd.DateOffset(years=1) 
# multiple
p4k.date + pd.DateOffset(years=1,days=-3) 

1       2018-12-03
2       2018-12-03
3       2018-12-03
4       2018-12-03
5       2018-12-02
           ...    
19551   2000-01-23
19552   2000-01-23
19553   2000-01-09
19554   2000-01-09
19555   2000-01-02
Name: date, Length: 19555, dtype: datetime64[ns]

In [41]:
p4k.date + pd.tseries.offsets.MonthEnd() # looks for next month end
p4k.date - pd.tseries.offsets.MonthEnd() # looks for last month end
p4k.date + MonthBegin() # looks next beginning of month 
p4k.date + BMonthEnd() # last business day of the month
p4k.date + QuarterEnd() # end of that quarter
p4k.date + YearEnd() # end of that year

1       2017-12-31
2       2017-12-31
3       2017-12-31
4       2017-12-31
5       2017-12-31
           ...    
19551   1999-12-31
19552   1999-12-31
19553   1999-12-31
19554   1999-12-31
19555   1999-12-31
Name: date, Length: 19555, dtype: datetime64[ns]

### 13. Text Functions

_key attributes: .lower, .upper, .title, .len_

_key methods: .replace(), .contains(), .startswith(), .endswith(), .lstrip(), .rstrip(), .strip(), .title(), .split(), .get()_



In [46]:
testString = "Education is what you get from reading the small print; experience is what you get from not reading it."

In [55]:
testString.upper() # all upper case
testString.lower() # all lower case
testString.title() # all first characters capitalized
len(testString) # number of characters

103

In [57]:
testString.replace("reading","slowly reading") # replacing strings

'Education is what you get from slowly reading the small print; experience is what you get from not slowly reading it.'

In [62]:
testBlankString = "   In the middle  "

In [65]:
testBlankString.lstrip() # stripping blank text on the left
testBlankString.rstrip() # stripping blank text on the left
testBlankString.strip() # stripping blank text on bot ends

'In the middle'

In [71]:
testBlankString.split() # breaks string apart into individual words
# testBlankString.split(expand=True) #parameter 'expand' will return a df, not list 

['In', 'the', 'middle']

### 14. Merging, Joining, and Concatenating

_key methods: .concat(), .append(), .merge(), .join()_

In [75]:
# .concat() is used to row bind two datasets, assuming they have the same columns

p4k1 = p4k.sample(n=12) # random sample of 12 rows
p4k2 = p4k.sample(n=5) # random sample of 5 rows

pd.concat([p4k1,p4k2], ignore_index=True) 

Unnamed: 0,album,artist,best,date,genre,review,score
0,River of Souls,Magic Trick,0,2014-01-09,,Tim Cohen wrote the songs for River of Souls w...,6.3
1,I Predict a Graceful Expulsion,Cold Specks,0,2012-05-01,,The 24-year old Canadian singer's transfixing ...,7.7
2,False Flag,Rangda,0,2010-05-18,Experimental,This out-rock supergroup featuring Sir Richard...,8.2
3,Rare Grooves,Various Artists,0,2005-03-17,,Ireland's Bassbin label continues the improbab...,8.2
4,A Tribute to Brother Weldon,Monk Hughes & The Outer Realm,0,2004-07-29,Jazz,It's tough not to love a schizophrenic genius....,5.5
5,Temporary Pleasure,Simian Mobile Disco,0,2009-09-03,Electronic,"A grab bag of styles, SMD's latest features Ho...",6.5
6,Lauschen,Qluster,0,2013-02-15,Electronic,"Hans-Joachim Roedelius (Zodiak Free Arts Lab, ...",6.5
7,Hungry for Nothing,Fight Amp,0,2008-03-20,Metal,This New Jersey band harkens back to a time wh...,7.8
8,Chewed Corners,µ-Ziq,0,2013-07-22,Electronic,Planet Mu's Mike Paradinas has never reached t...,7.1
9,Backlash,Black Joe Lewis & the Honeybears,0,2017-02-07,Rock,Black Joe Lewis trawls the familiar intersecti...,6.0


In [76]:
pd.concat([p4k1,p4k2], ignore_index=False)  # setting to false maintains old index

Unnamed: 0,album,artist,best,date,genre,review,score
4741,River of Souls,Magic Trick,0,2014-01-09,,Tim Cohen wrote the songs for River of Souls w...,6.3
6715,I Predict a Graceful Expulsion,Cold Specks,0,2012-05-01,,The 24-year old Canadian singer's transfixing ...,7.7
8985,False Flag,Rangda,0,2010-05-18,Experimental,This out-rock supergroup featuring Sir Richard...,8.2
15171,Rare Grooves,Various Artists,0,2005-03-17,,Ireland's Bassbin label continues the improbab...,8.2
15885,A Tribute to Brother Weldon,Monk Hughes & The Outer Realm,0,2004-07-29,Jazz,It's tough not to love a schizophrenic genius....,5.5
9808,Temporary Pleasure,Simian Mobile Disco,0,2009-09-03,Electronic,"A grab bag of styles, SMD's latest features Ho...",6.5
5797,Lauschen,Qluster,0,2013-02-15,Electronic,"Hans-Joachim Roedelius (Zodiak Free Arts Lab, ...",6.5
11543,Hungry for Nothing,Fight Amp,0,2008-03-20,Metal,This New Jersey band harkens back to a time wh...,7.8
5263,Chewed Corners,µ-Ziq,0,2013-07-22,Electronic,Planet Mu's Mike Paradinas has never reached t...,7.1
1075,Backlash,Black Joe Lewis & the Honeybears,0,2017-02-07,Rock,Black Joe Lewis trawls the familiar intersecti...,6.0


In [78]:
# append is very similar to concat
p4k1.append(p4k2, ignore_index=True)

Unnamed: 0,album,artist,best,date,genre,review,score
0,River of Souls,Magic Trick,0,2014-01-09,,Tim Cohen wrote the songs for River of Souls w...,6.3
1,I Predict a Graceful Expulsion,Cold Specks,0,2012-05-01,,The 24-year old Canadian singer's transfixing ...,7.7
2,False Flag,Rangda,0,2010-05-18,Experimental,This out-rock supergroup featuring Sir Richard...,8.2
3,Rare Grooves,Various Artists,0,2005-03-17,,Ireland's Bassbin label continues the improbab...,8.2
4,A Tribute to Brother Weldon,Monk Hughes & The Outer Realm,0,2004-07-29,Jazz,It's tough not to love a schizophrenic genius....,5.5
5,Temporary Pleasure,Simian Mobile Disco,0,2009-09-03,Electronic,"A grab bag of styles, SMD's latest features Ho...",6.5
6,Lauschen,Qluster,0,2013-02-15,Electronic,"Hans-Joachim Roedelius (Zodiak Free Arts Lab, ...",6.5
7,Hungry for Nothing,Fight Amp,0,2008-03-20,Metal,This New Jersey band harkens back to a time wh...,7.8
8,Chewed Corners,µ-Ziq,0,2013-07-22,Electronic,Planet Mu's Mike Paradinas has never reached t...,7.1
9,Backlash,Black Joe Lewis & the Honeybears,0,2017-02-07,Rock,Black Joe Lewis trawls the familiar intersecti...,6.0


Joining datasets togethers on the basis of their row values can be done with **.merge()** or **.join()**

In [82]:
d = {'artist': ['Prince', 'Kanye West'], 'DOB': ['June 7, 1958', 'June 8, 1977']} # test data
df = pd.DataFrame(data=d)

# Inner join example
p4k.merge(df, how="inner", on = "artist")

# p4k.merge(df, how="inner", on = "artist",suffixes=[" - p4k", " - df"]) # Example with identifying suffixes added

Unnamed: 0,album,artist,best,date,genre,review,score,DOB
0,"One Nite Alone, The Aftershow: It Ain't Over!",Prince,1,2016-09-01,Pop/R&B,Best new reissue Originally released as part o...,8.6,"June 7, 1958"
1,"Sign ""O"" the Times",Prince,0,2016-04-30,Pop/R&B,Choosing a single high point from Prince's glo...,10.0,"June 7, 1958"
2,1999,Prince,0,2016-04-30,Pop/R&B,1999 is the greatest album ever made about par...,10.0,"June 7, 1958"
3,Dirty Mind,Prince,0,2016-04-29,Pop/R&B,Princes first fully actualized album is an un...,10.0,"June 7, 1958"
4,Controversy,Prince,0,2016-04-29,Pop/R&B,Controversy emerged in 1981 at a pivotal time ...,9.0,"June 7, 1958"
5,HITNRUN Phase Two,Prince,0,2016-01-08,Pop/R&B,The second of Prince's HITNRUN series is anoth...,4.7,"June 7, 1958"
6,HITNRUN Phase One,Prince,0,2015-09-10,Pop/R&B,"Prince's new effort, exclusive to Jay Z's Tida...",4.5,"June 7, 1958"
7,Planet Earth,Prince,0,2007-07-23,Pop/R&B,So far this year Prince has wowed at the Super...,4.8,"June 7, 1958"
8,Ultimate Prince,Prince,0,2006-09-05,Pop/R&B,This Prince best of covers the Warner Brothers...,8.6,"June 7, 1958"
9,3121,Prince,0,2006-03-20,Pop/R&B,"On his latest release, the rock legend betters...",6.0,"June 7, 1958"


In [86]:
# Outer join example - returns all rows found in either source
p4k.merge(df, how="outer", on = "artist") # in this case, same as left join

# Right join example
p4k.merge(df, how="right", on = "artist") # in this case, same as inner

# Left join example
p4k.merge(df, how="left", on = "artist") # leaves plenty of NaN's in DOB column


Unnamed: 0,album,artist,best,date,genre,review,score,DOB
0,A.M./Being There,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,
1,No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5,
2,Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6,
3,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,
4,The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7,
...,...,...,...,...,...,...,...,...
19550,1999,Cassius,0,1999-01-26,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,
19551,Let Us Replay!,Coldcut,0,1999-01-26,Electronic,The marketing guys of yer average modern megac...,8.9,
19552,"Singles Breaking Up, Vol. 1",Don Caballero,0,1999-01-12,Experimental,"Well, kids, I just went back and re-read my re...",7.2,
19553,Out of Tune,Mojave 3,0,1999-01-12,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,


In [91]:
# If columns do not have the same names, use 'left_on' , 'right_on' parameters:
df.rename(columns = {"artist" : "artist_Name", "DOB" : "DOB"}, inplace=True) #  renaming of columns are done with a dictionary 
df

Unnamed: 0,artist_Name,DOB
0,Prince,"June 7, 1958"
1,Kanye West,"June 8, 1977"


In [96]:
# make use of left_on and right_on arguments
p4k.merge(df, how="right", left_on = "artist", right_on = "artist_Name") # leaves plenty of NaN's in DOB column

Unnamed: 0,album,artist,best,date,genre,review,score,artist_Name,DOB
0,"One Nite Alone, The Aftershow: It Ain't Over!",Prince,1,2016-09-01,Pop/R&B,Best new reissue Originally released as part o...,8.6,Prince,"June 7, 1958"
1,"Sign ""O"" the Times",Prince,0,2016-04-30,Pop/R&B,Choosing a single high point from Prince's glo...,10.0,Prince,"June 7, 1958"
2,1999,Prince,0,2016-04-30,Pop/R&B,1999 is the greatest album ever made about par...,10.0,Prince,"June 7, 1958"
3,Dirty Mind,Prince,0,2016-04-29,Pop/R&B,Princes first fully actualized album is an un...,10.0,Prince,"June 7, 1958"
4,Controversy,Prince,0,2016-04-29,Pop/R&B,Controversy emerged in 1981 at a pivotal time ...,9.0,Prince,"June 7, 1958"
5,HITNRUN Phase Two,Prince,0,2016-01-08,Pop/R&B,The second of Prince's HITNRUN series is anoth...,4.7,Prince,"June 7, 1958"
6,HITNRUN Phase One,Prince,0,2015-09-10,Pop/R&B,"Prince's new effort, exclusive to Jay Z's Tida...",4.5,Prince,"June 7, 1958"
7,Planet Earth,Prince,0,2007-07-23,Pop/R&B,So far this year Prince has wowed at the Super...,4.8,Prince,"June 7, 1958"
8,Ultimate Prince,Prince,0,2006-09-05,Pop/R&B,This Prince best of covers the Warner Brothers...,8.6,Prince,"June 7, 1958"
9,3121,Prince,0,2006-03-20,Pop/R&B,"On his latest release, the rock legend betters...",6.0,Prince,"June 7, 1958"


In [98]:
# The .join() method requires a lot less code
df.rename(columns = {"artist" : "artist", "DOB" : "DOB"}, inplace=True) #  renaming of columns are done with a dictionary 
p4k.join(df) 

Unnamed: 0,album,artist,best,date,genre,review,score,artist_Name,DOB
1,A.M./Being There,Wilco,1,2017-12-06,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0,Kanye West,"June 8, 1977"
2,No Shame,Hopsin,0,2017-12-06,Rap,"On his corrosive fifth album, the rapper takes...",3.5,,
3,Material Control,Glassjaw,0,2017-12-06,Rock,"On their first album in 15 years, the Long Isl...",6.6,,
4,Weighing of the Heart,Nabihah Iqbal,0,2017-12-06,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7,,
5,The Visitor,Neil Young / Promise of the Real,0,2017-12-05,Rock,"While still pointedly political, Neil Youngs ...",6.7,,
...,...,...,...,...,...,...,...,...,...
19551,1999,Cassius,0,1999-01-26,Electronic,"Well, it's been two weeks now, and I guess it'...",4.8,,
19552,Let Us Replay!,Coldcut,0,1999-01-26,Electronic,The marketing guys of yer average modern megac...,8.9,,
19553,"Singles Breaking Up, Vol. 1",Don Caballero,0,1999-01-12,Experimental,"Well, kids, I just went back and re-read my re...",7.2,,
19554,Out of Tune,Mojave 3,0,1999-01-12,Rock,"Out of Tune is a Steve Martin album. Yes, I'll...",6.3,,


### 15. Reshaping dataframes
_key methods: .pivot(), .pivot_table(), .melt()_


In [141]:
# .pivot() returns a reshaped dataframe as a pivot table. Uses unique values from specified index and columns.
# does not support aggregation

p4ksubset = p4k[p4k['artist'].isin(["Kanye West","Nas","Coldplay"])] 
p4ksubset = p4ksubset[['artist','album','best','score']]
p4ksubset

Unnamed: 0,artist,album,best,score
501,Coldplay,Kaleidoscope,0,5.8
2315,Kanye West,The Life of Pablo,1,9.0
2493,Coldplay,A Head Full of Dreams,0,4.8
4279,Coldplay,Ghost Stories,0,4.4
5369,Kanye West,Yeezus,1,9.5
5884,Nas,Illmatic,1,10.0
6458,Nas,Life Is Good,1,8.3
7297,Coldplay,Mylo Xyloto,0,7.0
8370,Kanye West,My Beautiful Dark Twisted Fantasy,1,10.0
9436,Kanye West,VH1 Storytellers,0,4.9


In [143]:
p4ksubset.pivot(index='album', columns='artist', values='best')

artist,Coldplay,Kanye West,Nas
album,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
808s and Heartbreak,,0.0,
A Head Full of Dreams,0.0,,
A Rush of Blood to the Head,0.0,,
Ghost Stories,0.0,,
God's Son,,,0.0
Graduation,,1.0,
Greatest Hits,,,0.0
Hip Hop Is Dead,,,0.0
Illmatic,,,1.0
Kaleidoscope,0.0,,


In [145]:
# .pivot_table() - parameters values, index, columns, and agg (aggregate function)
p4ksubset.pivot_table(values='score', index='artist', aggfunc='mean')

# other aggfunc arguments include count, sum, min, max

Unnamed: 0_level_0,score
artist,Unnamed: 1_level_1
Coldplay,5.727273
Kanye West,8.425
Nas,7.375


In [None]:
# pivot table with multiple indexes

In [146]:
p4ksubset.pivot_table(values='score', index=['artist','best'], aggfunc='mean') 

Unnamed: 0_level_0,Unnamed: 1_level_0,score
artist,best,Unnamed: 2_level_1
Coldplay,0,5.727273
Kanye West,0,6.25
Kanye West,1,9.15
Nas,0,6.783333
Nas,1,9.15


In [153]:
## .melt is essentially the inverse of pivot_table. Converting to a more tabular version


Unnamed: 0_level_0,Unnamed: 1_level_0,score
artist,best,Unnamed: 2_level_1
Coldplay,0,5.727273
Kanye West,0,6.25
Kanye West,1,9.15
Nas,0,6.783333
Nas,1,9.15



# OLD NOTES BELOW


### 1.Series

_A note on attributes and methods:_
An attribute is something that bound to an object, while a method is a procedure or action. Also, attributes have no parantheses, attributes require them
    
#### Series attributes and methods, explanations where necessary:
    
   - series.head
   - series.tail
   - len(series) - _Return length of series including NA/null observations_
   - sorted(series) - _Sorts values_
   - list(series) - _Converts series to a list_
   - dict(series) - _Turns the series into a dictionary object where the the existing index becomes the dictionary key_
   - min(series) -_For strings, will return first value sorted alphabetically_
   - max(series) - _For strings, will return last value sorted alphabetically _
   - series.values - _values attribute_
   - series.index - _values attribute_
   - series.dtype - _data type_
   - series.is_unique - _Returns unique values_
   - series.shape - _dimensions of series/dataframe_
   - series.size - _number of elements (rows*columns)_
   - series.count() - _Returns number of non-NA/null observations_
   - series.name - _name of the series_
   - series.sort_values(inplace=T) - _sorts values, inplace=T replaces original values with sorted ones_
   - series.sort_index(inplace=T) - _sorts index, inplace=T replaces original values with sorted ones_
   - "Value" in series 
   - series['n'] - _returns nth element by index_
   - series['index label'] - _returns element by index value name_
   - series.sum()
   - series.mean()
   - series.std()
   - series.min()
   - series.max()
   - series.median()
   - series.mode()
   - series.describle() - _Similar to summary() in R, returns key descriptive stats_
   - series.idmax() - _Return the row label of the maximum value._
   - series.idmin() - _Return the row label of the minimum value._
   - series.value_counts() - _Similar to table() in base R_
    
#### Apply method - invokes a function on a series of values

   - [Documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html)
   - series.apply(FUNCTION, args(,.. additional arguments)


#### lambda
   - [Explanation](https://stackabuse.com/lambda-functions-in-python/)
   - In Python, the lambda keyword declares an anonymous (no name) function, which are referred to as "lambda functions". Although syntactically they look different, lambda functions behave in the same way as regular functions that are declared using the def keyword.
    

#### .map()
    - .map()

In [41]:
range(12)

range(0, 12)

In [42]:
artists = p4k["artist"]
scores = p4k["score"]
p4k.head()

Unnamed: 0,album,artist,best,date,genre,review,score
1,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
2,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5
3,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6
4,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
5,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Youngs ...",6.7


In [43]:
## Methods and Attributes

artists.count() 
artists.value_counts() 
artists.head
artists.tail
len(artists)
sorted(artists)  
list(artists) 
dict(artists) 
min(artists)  
max(artists) 
artists.values 
artists.index
artists.dtype
artists.is_unique 
artists.shape 
artists.size 
artists.name 
artists.sort_values()  
artists.sort_index() 
"David Bowie" in artists # in operator
artists[100] 
#artists['David Bowie'] 
scores.sum()
scores.mean()
scores.std()
scores.min()
scores.max()
scores.median()
scores.mode()
scores.describe() 
scores.idxmax("Score") 
scores.idxmin("Score") 

12223

In [44]:
## Apply Method - invokes a function on a series of values

# returns nth character of each artist name, with index starting at 0
def n_char(string,n):
    if len(string)<n+1:
        return ''
    else:
        return(string[n])
    
 
## Returning character from artist string at positiong 3: 
artists.apply(n_char, args=(3,))



1        c
2        s
3        s
4        i
5        l
        ..
19551    s
19552    d
19553     
19554    a
19555    l
Name: artist, Length: 19555, dtype: object

In [45]:
## Lambda - 
artists.apply(lambda x: x[0])

1        W
2        H
3        G
4        N
5        N
        ..
19551    C
19552    C
19553    D
19554    M
19555    N
Name: artist, Length: 19555, dtype: object

### 2. Data Frames

#### Basic Information
   - df.shape
   - df.dtypes
   - df.columns
   - df.axes
   - df.info
   - df.sum( , axes={1,0})
    
#### Selecting column(s)
    
   - df["c1"] or df.c1, df[["c1","c2"]]
    
#### Adding a new column
    
   - df["newCol"] = {value}
 
#### Broadcasting Operations
   - df[value].add(5) or df[value] + 5 (accounts for NAs)
   - df[value].mul(3) 

#### Dropping Rows with Null Values  
  
   - df.dropna() - _drops any observations with an NA values. Similar to R's complete.observations_
   - df.dropna(how="all") - _only drops rows with all NA values_
   - df.dropna(axis=1) - _drops columns with any NA values_
    
   - df.fillna(value=0) - _fills all values in the dataframe_
   - df["column1"].fillna(0,inplace=True) - _column by column approach_
    
    
#### Converting Types using as.type() method
   - df["Float_Score"].astype("int") - _converts FLOAT to INT. Note that there is not inplace arg_
   - as.type("category") - _can be used to convert a string to a R factor-like variable. Saves space._  
    
#### Sorting/Ranking Values
   - df.sort_values([Co1],[Col2], ascending=[True,False])
   - df.rank() - _provides rankings as integers_
    
#### Filtering based upon a condition
   - df["Col1"]=="Value" or df["Col1"]<=22 will return a boolean
   - df[df["Col1"]=="Value"] will return a filtered dataset
   - Alternatively, filter1 = df["Col1"]=="Value", df[filter1]
   - Conditions can be strung together with AND (&), OR (|)
    
#### .isin() Method

   - df["Col1].isin(["Value1","Value2"]) can be used to filter/extract rows in a dataframe
    
#### .isnull(), .notnull() Methods
   - df["Col1"].isnull() - _produces a boolean series where Col1 value is null_
   - df["Col1"].notnull() - _produces a boolean series where Col1 value is NOT null_

#### .between() Method
   - df["Col1"].between(200,300) - _returns a boolean series of observations falling between 200 and 300, inclusive. Works on times, dates, and numerics_
   
#### .duplicated() Method
   - df["Col1"].duplicate(keep="first") - _Idenifies duplicates and removes them, by default keeps the first observation. keep=False will return all observations that have duplicates_
   
#### .drop_duplicates() Method
   - df.drop_duplicate() - _Applies to a df across all columns, where as the .duplicated() method above applies to a series._
   - df.drop_duplicates(subset=["Col1"], keep = "first") - Can be applied to specific columns_
   
#### .unique() Method
   - df["Col1"].unique() - counts unique values for one column

#### .nunique() Method
   - df.nunique() - counts unique values across columns
 
#### .set_index() Method
   - df.set_index("Col1") - _replaces existing index with values from a column_
   
#### .reset_index() Method
   - df.reset_index(drop=True) - _resets index and drops values_
   - df.reset_index(drop=True) - _getting back to original_
   
#### Retrieving Rows by Index Label with .loc()
   - .loc uses brackets, parantheses 
   - df.loc["indexLabel"] - Retrieves row with specific index label
   
#### Retrieving Rows by Index Position with .loc()
   - df.iloc[100] - _retrieves row with specific index number(s)_
   - df.iloc[60:120] - _retrieving a range_
   - df.iloc[12,1:3] - _retrieving a certain row, multiple columns_

#### Identifying Individual cells, setting new values
   - df.iloc[0, 1] == "New Value" 
   - df.iloc[2, 0:] == "New row value"

#### Renaming Index Labels or Columns in a Dataframe
   - df.rename(columns = {"Col1" : "NewCol1", "Col2" : "NewCol2"}, inplace=T) - _renaming of columns are done with a dictionary 


#### Deleting Rows or Columns from a Dataframe
   - dr.drop["Row1", axis=0] - _drops a row by name_
   - df.drop("Col1", axis=1) - _drop a column_
   - del df["Col1"] - _alternative method_
   
#### Random Samples with .sample() Method
   - df.sample(n) - _sample n random rows_
   - df.sample(frac=0.25) - _samples a random 25%_
   - df.sample(axis=) - _can sample rows or cols with axis_
   
#### The .nsmallest() and .nlargest() Methods
   - df.nsmallest(n=3, columns="Col1 ) - _returns 3 smallest values for Col1_
   - df.nlargest(n=3, columns="Col1 ) - _returns 3 largest values for Col1_
   
   
#### Filtering with the .where() Method()
   - df.where(df[Col1]=="Value") - _returns the original data frame with NAS in rows that don't meet the filtering criteria_
   
#### The .query() Method
   - df.query('Col'=="Value") - _Similar to filter, only returns matching rows_
   

#### .copy() Method
   - df.copy() - _create a copy of the object's indices and data_
   
   

### 3. Strings

#### Common methods   
   - string.lower()
   - string.upper()
   - string.title() - _Capitalizes first letter of each word_
   - len(string)
   - string.strip() - _Strips white space_
   - string.lstrip() - _Strips white space on the left_
   - string.rstrip() - _Strips white space on the right_
   

#### .str.replace() method
   - "Hello world".replace("l","!") - _Two arguments: pattern, substitute_
   
   
#### Filtering with string methods   
   - df["Col1"].str.lower().str.contains("water") - _Searches Col1 for strings that contain 'water'_
   - Other alternate searches: str.startswith(), str.endswith()

#### Splitting strings by characters 
   - "Hello my name is Ravi".split(" ") # single arg is the delimiter/sep
   - **Expand parameter**: df[["First Name", "Last Name"]] = df["Name"].str.split(",", expand=True) - _Breaks apart Name into first and last name columns_
   - **n parameter** - _n equals the maximum number of splits_



### 4. Multi Index
   
#### Creating a multi-index with set_index() 
   - df.set_index(keys=["Col1","Col2"], inplace=True) - _Creates multi-level index_
    
#### The .get_level_values() Method
   - df.index.get_level_values() - _returns index values_

#### The .set_names() Method
   - df.index_set_names(["Name1","Name2"]) - _renames index levels_
   
#### The sort_index() Method
   - df.sort_index(ascending=[True,False]) - _sorts indexes_

#### The .transpose() method and MultiIndex on Column Level
   - dfT = df.transpose() - _transposes data.frame, including indexes_ 

#### The .swaplevel() Method
   - df.swaplevel() - _swamps levels of multi-index_

#### The .stack() Method
   - Similar to R's tidy::gather()
   - [Official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html)

#### The .unstack() Method 
   - Similar to R's tidyr::spread()
   - 

#### The .pivot() Method
   - df.pivot(index="Col1", columns"Col2", values="Col3") - _Returns reshaped DataFrame organized by given index/column names_ 
   - [Official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html)

#### The .pivot_table() Method
   - df.pivot_table(index="Col1", columns"Col2", values="Col3", aggfunc="mean") - _Create a spreadsheet-style pivot table as a DataFrame_
   - [Official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html)

#### The pd.melt() Method
   - Essentially the inverse of pivot_table. Converting into a longer table
   - [Official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.melt.html)


### 5. Group by

#### The pd.melt() Method
   - 
   - 

### 6. Merging, Joining, Concatenating
   - 
   - 
   - 
   - 
   - 

### 7. Merging, Joining, Concatenating  
   - 
   - 
   - 
   - 
   - 