# Introduction to pandas

In this notebook, you'll get familiar with the basics of reading in and getting acquainted with data using the `pandas` library.

### Import the `pandas` library, aliased as `pd`.

In [1]:
import pandas as pd

### Let's explore these pandas methods, attributes, and accessors
 
**Methods of Inspecting** 
 * .head()
 * .tail()
 * .shape
 * .info()

**Method of Modifying**
 * .drop()
 * renaming columns
 
**Methods of Summarizing**
 * .unique()
 * .nunique()
 * .value_counts()

**Methods of Slicing and Filtering**
 * .loc[]

## Step 1: Reading in Data and Initial Inspection

In [2]:
art = pd.read_csv('../data/public_art.csv')

To inspect a portion of the dataframe, you can use `.head()` (to see the first few rows) or `.tail()` (to see the last few rows).

In [5]:
art.head()

Unnamed: 0,Title,Last Name,First Name,Location,Medium,Type,Description,Latitude,Longitude,Mapped Location
0,[Cross Country Runners],Frost,Miley,"4001 Harding Rd., Nashville TN",Bronze,Sculpture,,36.12856,-86.8366,"(36.12856, -86.8366)"
1,[Fourth and Commerce Sculpture],Walker,Lin,"333 Commerce Street, Nashville TN",,Sculpture,,36.16234,-86.77774,"(36.16234, -86.77774)"
2,12th & Porter Mural,Kennedy,Kim,114 12th Avenue N,Porter all-weather outdoor paint,Mural,Kim Kennedy is a musician and visual artist wh...,36.1579,-86.78817,"(36.1579, -86.78817)"
3,A Splash of Color,Stevenson and Stanley and ROFF (Harroff),Doug and Ronnica and Lynn,616 17th Ave. N.,"Steel, brick, wood, and fabric on frostproof c...",Mural,Painted wooden hoop dancer on a twenty foot po...,36.16202,-86.79975,"(36.16202, -86.79975)"
4,A Story of Nashville,Ridley,Greg,"615 Church Street, Nashville TN",Hammered copper repousse,Frieze,"Inside the Grand Reading Room, this is a serie...",36.16215,-86.78205,"(36.16215, -86.78205)"


In [4]:
art.tail(2)

Unnamed: 0,Title,Last Name,First Name,Location,Medium,Type,Description,Latitude,Longitude,Mapped Location
130,Women Suffrage Memorial,LeQuire,Alan,"600 Charlotte Avenue, Nashville TN",Bronze sculpture,Sculpture,,36.16527,-86.78382,"(36.16527, -86.78382)"
131,Youth Opportunity Center-STARS Nashville - Pea...,Rudloff,Andee,1704 Charlotte Ave.,House paint on vinyl,Mural,,36.15896,-86.799,"(36.15896, -86.799)"


To see the number of rows and columns, you can access the `.shape` attribute. This shows (number of rows, number of columns).

In [6]:
art.shape

(132, 10)

To get more information about what is contained in each column, you can use `.info()'.

In [7]:
art.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 132 entries, 0 to 131
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Title            132 non-null    object 
 1   Last Name        132 non-null    object 
 2   First Name       122 non-null    object 
 3   Location         131 non-null    object 
 4   Medium           128 non-null    object 
 5   Type             132 non-null    object 
 6   Description      87 non-null     object 
 7   Latitude         132 non-null    float64
 8   Longitude        132 non-null    float64
 9   Mapped Location  132 non-null    object 
dtypes: float64(2), object(8)
memory usage: 10.4+ KB


**What do you notice?**

You may notice that most of the columns are "objects". This is the datatype that `pandas` uses for text data. 

The float64 datatype is a numeric datatype that can handle decimal values.

## Step 2: Making Modifications

Since the Mapped Location information is already contained in the Latitude and Longitude columns, you really don't need to store it twice. You can use the `.drop()` method to get rid of that column.

In [None]:
art.drop(columns='Mapped Location')

In [None]:
art.head(2)

What happened? We failed to save the result of dropping the column. We need to assign the result back to the art dataframe.

In [None]:
art = art.drop(columns = 'Mapped Location')

In [None]:
art.head(2)

If you are dropping a large number, sometimes it can be easier to just specify which ones you want to keep. This can be done with double square brackets. For example, if we only wanted to keep the title and artist names, we could do this:

In [None]:
art[['Title', 'Last Name', 'First Name']]

Let's say you want to rename the columns of the art dataframe. If you are only changing a few columns and keeping the rest the same, you can do so using the `.rename` method by passing in a dictionary whose keys are the old names and values are the new names.

In [None]:
art = art.rename(columns = {'Latitude': 'lat', 'Longitude': 'lng'})
art.head(2)

Alternatively, you can just assign a list of column names to the `.columns` attribute of your dataframe.

In [None]:
art.columns = ['title', 'last_name', 'first_name', 'location', 'medium', 'type', 'description', 'lat', 'lng']

In [None]:
art.head(2)

## Step 3: Exploring and Slicing
What are the different types of artwork in this dataset?

In [None]:
art['type'].unique()

If you only care about the _number_ of unique values in a colmn, you can use `.nunique`.

For example, if you want to know the number of artist last names:

In [None]:
art['last_name'].nunique()

Which is the most popular type?

In [None]:
art['type'].value_counts()

What if you want to see all of the Murals? You can slice a DataFrame using `.loc` and passing in a conditional expression.

In [None]:
art.loc[art['type'] == 'Mural']

If you want to do further work or exploration with the sliced dataframe, you need to save it to a new variable.

In [None]:
murals = art.loc[art['type'] == 'Mural']
murals.shape

Who is the most prolific mural painter in Nashville?

In [None]:
murals['last_name'].value_counts()

**Your Turn:** Filter the data to show the murals that were painted by Cooper.

In [None]:
# Your Code Here