# Python Open Labs: Working with multiple datasets in pandas

## Setup
With this Google Colaboratory (Colab) notebook open, click the "Copy to Drive" button that appears in the menu bar. The notebook will then be attached to your own user account, so you can edit it in any way you like -- you can even take notes directly in the notebook.

## Instructors
- Walt Gurley
- Claire Cahoon

## Open Labs agenda

1.   **Guided activity**: One of the instructors will share their screen to work through the guided activity and teach concepts along the way.

2.   **Open lab time**: After the guided portion of the Open Lab, the rest of the time is for you to ask questions, work collaboratively, or have self-guided practice time. You will have access to instructors and peers for questions and support.

Breakout rooms will be available if you would like to work in small groups. If you have trouble joining a room, ask in the chat to be moved into a room.

## Learning objectives

By the end of our workshop today, we hope you'll understand basic pandas methods for loading, combining, and preparing different types of datasets for analyses with pandas.

## Today's Topics

- Editing DataFrame index labels and column headers
- Concatenating DataFrames
- Merging DataFrames
- Removing columns from a DataFrame
- Filtering rows in a DataFrame

## Questions during the workshop

Please feel free to ask questions throughout the workshop.

We have a second instructor who will available during the workshop. They will answer as able, and will collect questions with answers that might help everyone to be answered at the end of the workshop.

The open lab time is when you will be able to ask more questions and work together on the exercises.

## Guided Instruction

In this Open Lab we're introducing how to use the pandas library to load, combine, and prepare multiple datasets for analysis.

In this section, we will work through examples using data from the [Museum of Modern Art (MoMA) research dataset](https://github.com/MuseumofModernArt/collection) containing records of all of the works that have been cataloged in the database of the MoMA collection.

> "The Museum’s website features 89,695 artworks from 26,494 artists. This research dataset contains 138,151 records, representing all of the works that have been accessioned into MoMA’s collection and cataloged in our database. It includes basic metadata for each work, including title, artist, date made, medium, dimensions, and date acquired by the Museum. Some of these records have incomplete information and are noted as “not Curator Approved." - [MoMA Github repository for collection data](https://github.com/MuseumofModernArt/collection)

We have split the dataset into several different subsections (paintings, sculptures, photographs, and artist information) and file types to use in activities. We will be referencing the data that we have prepared in our [Github repository for teaching datasets](https://github.com/ncsu-libraries-data-vis/teaching-datasets/tree/main/moma_data).

In [31]:
# Import the Pandas library as pd (callable in our code as pd)
import pandas as pd

### Load the datasets

In [32]:
# Import the MoMA paintings dataset (CSV file
# The file location
paintings_file_url = 'https://raw.githubusercontent.com/ncsu-libraries-data-vis/teaching-datasets/main/moma_data/moma_paintings.csv'

# Read in the file and print out the DataFrame
paintings = pd.read_csv(paintings_file_url)
paintings

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,ThumbnailURL,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
0,32095,"Rope and People, I",Joan Miró,4016.0,"Barcelona, March 27, 1935","Oil on cardboard mounted on wood, with coil of...","41 1/4 x 29 3/8"" (104.8 x 74.6 cm)",Gift of the Pierre Matisse Gallery,71.1936,Painting,...,http://www.moma.org/media/W1siZiIsIjE2MDU0NiJd...,,,,104.800000,,,74.600000,,
1,33167,Fire in the Evening,Paul Klee,3130.0,1929,Oil on cardboard,"13 3/8 x 13 1/4"" (33.8 x 33.3 cm)",Mr. and Mrs. Joachim Jean Aberbach Fund,153.1970,Painting,...,http://www.moma.org/media/W1siZiIsIjE3Njc2NyJd...,,,,33.800000,,,33.300000,,
2,33424,Portrait of an Equilibrist,Paul Klee,3130.0,1927,Oil and collage on cardboard over wood with pa...,"24 7/8 x 15 3/4"" (63.2 x 40 cm)",Mrs. Simon Guggenheim Fund,195.1966,Painting,...,http://www.moma.org/media/W1siZiIsIjE3OTI4NSJd...,,,,60.300000,,,36.800000,,
3,34481,Guitar,Pablo Picasso,4609.0,"Paris, early 1919","Oil, charcoal and pinned paper on canvas","7' 1"" x 31"" (216 x 78.8 cm)",Gift of A. Conger Goodyear,384.1955,Painting,...,http://www.moma.org/media/W1siZiIsIjE1MDQ2MiJd...,,,,215.900000,,,78.700000,,
4,35396,Grandmother,Arthur Dove,1602.0,1925,"Shingles, needlepoint, page from Concordance, ...","20 x 21 1/4"" (50.8 x 54.0 cm)",Gift of Philip L. Goodwin (by exchange),636.1939,Painting,...,http://www.moma.org/media/W1siZiIsIjI0NzA5NCJd...,,,,50.800000,,,54.000000,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2345,138102,Zacimba Gaba,Dalton Paula,132719.0,2020,"Oil, pencil, and gold leaf on two joined canvases","24 × 17 3/4"" (61 × 45.1 cm), in two parts",,TR16514.1,Painting,...,,,0.000000,,60.960122,,,45.085090,,
2346,138103,Zumbi,Dalton Paula,132719.0,2020,"Oil, pencil, and gold leaf on two joined canvases","24 × 17 3/4"" (61 × 45.1 cm), in two parts",,TR16514.2,Painting,...,,,0.000000,,60.960122,,,45.085090,,
2347,138104,Vertigo #2,Ouattara Watts,132954.0,2011,"Acrylic, paper pulp, cut and pasted fabrics, a...","118 1/4 × 165 1/2 × 3 3/4"" (300.4 × 420.4 × 9....",,TR16516,Painting,...,,,9.525019,,300.355601,,,420.370841,,
2348,138105,Lot 111113 (flare 1),Donald Moffett,7435.0,2013,Acrylic and lacquer on linen with cotton and a...,"54 × 44"" (137.2 × 111.8 cm)",,TR16517,Painting,...,,,0.000000,,137.160274,,,111.760224,,


In [33]:
# Import the MoMA photographs dataset (Excel file)
# The file location
photos_file_url = 'https://raw.githubusercontent.com/ncsu-libraries-data-vis/teaching-datasets/main/moma_data/moma_photographs.xlsx'

# Read in the file and print out the DataFrame
photos = pd.read_excel(photos_file_url)
photos

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,ThumbnailURL,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
0,30700,Untitled from VVV Portfolio,David Hare,2504.0,"c. 1941, published 1943",Gelatin silver print mounted on paper from a p...,"composition: 12 x 9 3/4"" (30.5 x 24.8 cm); she...",The Louis E. Stern Collection,1113.1964.6,Photograph,...,http://www.moma.org/media/W1siZiIsIjM0NTUzOCJd...,,,,30.50,,,24.8,,
1,36358,Tuileries Sanglier / d'apres l'antique,Eugène Atget,229.0,1911,Albumen silver print,"8 11/16 × 6 9/16"" (22 × 16.7 cm)",Abbott-Levy Collection. Partial gift of Shirle...,1.1969.1,Photograph,...,http://www.moma.org/media/W1siZiIsIjMwMTMwNCJd...,,,,,,,,,
2,36359,Sapin (Trianon),Eugène Atget,229.0,1910-14,Albumen silver print,"Approx. 7 1/8 × 8 5/8"" (18.1 × 21.9 cm)",Abbott-Levy Collection. Partial gift of Shirle...,1.1969.10,Photograph,...,http://www.moma.org/media/W1siZiIsIjMxODEwMSJd...,,,,,,,,,
3,36360,"Versailles, vase par Ballin",Eugène Atget,229.0,1902,Matte albumen silver print,"Approx. 8 9/16 × 7 1/16"" (21.8 × 18 cm)",Abbott-Levy Collection. Partial gift of Shirle...,1.1969.100,Photograph,...,http://www.moma.org/media/W1siZiIsIjMxODEwMiJd...,,,,,,,,,
4,36361,Facteur,Eugène Atget,229.0,1899-1900,Gelatin silver printing-out-paper print,"Approx. 8 11/16 × 6 9/16"" (22 × 16.7 cm)",Abbott-Levy Collection. Partial gift of Shirle...,1.1969.1000,Photograph,...,http://www.moma.org/media/W1siZiIsIjI4NjE4MiJd...,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31438,138141,Untitled,Unknown photographer,8595.0,c. 1910,Gelatin silver print,"2 3/8 × 4 1/8"" (6 × 10.5 cm)",Gift of John Jeremiah Sullivan,TR16527.20,Photograph,...,http://www.moma.org/media/W1siZiIsIjQ5MjcxMyJd...,,,,6.00,,,10.5,,
31439,138142,Untitled,Unknown photographer,8595.0,c. 1910,"Gelatin silver print, printed later","8 1/16 × 13 1/16"" (20.5 × 33.2 cm)",Gift of John Jeremiah Sullivan,TR16527.21,Photograph,...,http://www.moma.org/media/W1siZiIsIjQ5MjcxNCJd...,,,,20.50,,,33.2,,
31440,138143,Untitled,Unknown photographer,8595.0,c. 1918-30,Gelatin silver print (postcard),"4 × 3 3/8"" (10.2 × 8.6 cm)",Gift of John Jeremiah Sullivan,TR16527.22,Photograph,...,http://www.moma.org/media/W1siZiIsIjQ5MjcxNiJd...,,,,10.20,,,8.6,,
31441,138144,Untitled,Unknown photographer,8595.0,c. 1900,Gelatin silver print,"6 7/16 × 9 3/4"" (16.4 × 24.7 cm)",Gift of John Jeremiah Sullivan,TR16527.23,Photograph,...,http://www.moma.org/media/W1siZiIsIjQ5MjcxOCJd...,,,,16.36,,,24.7,,


In [34]:
# Import the MoMA sculptures dataset (JSON file)
# The file location
sculptures_file_url = 'https://raw.githubusercontent.com/ncsu-libraries-data-vis/teaching-datasets/main/moma_data/moma_sculptures.json'

# Read in the file and print out the DataFrame
sculptures = pd.read_json(sculptures_file_url)
sculptures

Unnamed: 0,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,Department,...,ThumbnailURL,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
73418,Surface with Vibrating Texture,Getulio Alviani,137.0,1964,Brushed aluminum on board,"33 x 32 3/4"" (83.6 x 83.2 cm)",Larry Aldrich Foundation Fund,105.1965,Sculpture,Painting & Sculpture,...,http://www.moma.org/media/W1siZiIsIjIwODIwOCJd...,,,,83.600000,,,83.200000,,
73474,IN RELATION TO AN INCREASE IN QUANTITY REGARDL...,Lawrence Weiner,6288.0,1973-74,LANGUAGE + THE MATERIALS REFERRED TO,Dimensions variable,Given anonymously,117.1975,Sculpture,Painting & Sculpture,...,http://www.moma.org/media/W1siZiIsIjMxODk1MSJd...,,,,,,,,,
73564,3 Standard Stoppages,Marcel Duchamp,1634.0,Paris 1913-14,"Wood box 11 1/8 x 50 7/8 x 9"" (28.2 x 129.2 x ...",,Katherine S. Dreier Bequest,149.1953.a-i,Sculpture,Painting & Sculpture,...,http://www.moma.org/media/W1siZiIsIjEzODY0NSJd...,,,,13.300000,,,120.000000,,
73567,To Be Looked at (from the Other Side of the Gl...,Marcel Duchamp,1634.0,Buenos Aires 1918,"Oil, silver leaf, lead wire, and magnifying le...","Overall 22"" (55.8 cm) high",Katherine S. Dreier Bequest,150.1953,Sculpture,Painting & Sculpture,...,http://www.moma.org/media/W1siZiIsIjI0MzI3MyJd...,,,,49.500000,,,39.700000,,
73733,Revolving,Kurt Schwitters,5293.0,1919,"Wood, metal, cord, cardboard, wool, wire, leat...","48 3/8 x 35"" (122.7 x 88.7 cm)",Advisory Committee Fund,231.1968,Sculpture,Painting & Sculpture,...,http://www.moma.org/media/W1siZiIsIjEyMjc3MCJd...,,,,122.700000,,,88.700000,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
136495,Disease Thrower #5,Guadalupe Maravilla,131948.0,2019,"Glass, steel, wood, cotton, plastic, loofah, p...","91 × 55 × 45"" (231.1 × 139.7 × 114.3 cm)",Fund for the Twenty-First Century,703.2019,Sculpture,Painting & Sculpture,...,http://www.moma.org/media/W1siZiIsIjQ3NTM0NiJd...,,114.300229,,231.140462,,,139.700279,,
136496,Circle Serpent,Guadalupe Maravilla,131948.0,2019,Maguey leaves,Dimensions variable,Fund for the Twenty-First Century,704.2019,Sculpture,Painting & Sculpture,...,http://www.moma.org/media/W1siZiIsIjQ3NTM0OSJd...,,,,,,,,,
136526,Candle Piece,Richard Serra,5349.0,1967,Wood and candle,"11 13/16 × 144 1/8 × 3 9/16"" (30 × 366 × 9 cm)",Gift of Peter Freeman,707.2019.a-b,Sculpture,Painting & Sculpture,...,http://www.moma.org/media/W1siZiIsIjQ3NTM1NSJd...,,9.000000,,30.000000,,,366.000000,,
136539,Unfinished Construction. Posthumous Homage to ...,Mathias Goeritz,2203.0,c. 1953,Painted wood,"36 5/8 × 14 15/16 × 28 3/4"" (93 × 38 × 73 cm)",,TR16416,Sculpture,Painting & Sculpture,...,,,73.000000,,93.000000,,,38.000000,,


### Reset DataFrame index labels

The JSON file of sculpture artworks we imported does not include the column `Index`. Instead, these values are used as the index labels. We want this dataset to match the format of our paintings and photographs datasets, so we first need to reset the index of the `sculptures` dataset using the DataFrame method `reset_index()`.

In [35]:
# Reset the sculptures DataFrame index
sculptures_reset = sculptures.reset_index()

# Print out the resulting dataset
sculptures_reset.head()

Unnamed: 0,index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,ThumbnailURL,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
0,73418,Surface with Vibrating Texture,Getulio Alviani,137.0,1964,Brushed aluminum on board,"33 x 32 3/4"" (83.6 x 83.2 cm)",Larry Aldrich Foundation Fund,105.1965,Sculpture,...,http://www.moma.org/media/W1siZiIsIjIwODIwOCJd...,,,,83.6,,,83.2,,
1,73474,IN RELATION TO AN INCREASE IN QUANTITY REGARDL...,Lawrence Weiner,6288.0,1973-74,LANGUAGE + THE MATERIALS REFERRED TO,Dimensions variable,Given anonymously,117.1975,Sculpture,...,http://www.moma.org/media/W1siZiIsIjMxODk1MSJd...,,,,,,,,,
2,73564,3 Standard Stoppages,Marcel Duchamp,1634.0,Paris 1913-14,"Wood box 11 1/8 x 50 7/8 x 9"" (28.2 x 129.2 x ...",,Katherine S. Dreier Bequest,149.1953.a-i,Sculpture,...,http://www.moma.org/media/W1siZiIsIjEzODY0NSJd...,,,,13.3,,,120.0,,
3,73567,To Be Looked at (from the Other Side of the Gl...,Marcel Duchamp,1634.0,Buenos Aires 1918,"Oil, silver leaf, lead wire, and magnifying le...","Overall 22"" (55.8 cm) high",Katherine S. Dreier Bequest,150.1953,Sculpture,...,http://www.moma.org/media/W1siZiIsIjI0MzI3MyJd...,,,,49.5,,,39.7,,
4,73733,Revolving,Kurt Schwitters,5293.0,1919,"Wood, metal, cord, cardboard, wool, wire, leat...","48 3/8 x 35"" (122.7 x 88.7 cm)",Advisory Committee Fund,231.1968,Sculpture,...,http://www.moma.org/media/W1siZiIsIjEyMjc3MCJd...,,,,122.7,,,88.7,,


### Renaming column labels

When we reset our index a new column with the label `index` was created. Let's change the name of this column to `Index` (with an uppercase "I") to match our other datasets using the DataFrame method `rename()`.

In [36]:
# Rename the column we created
sculptures_rename = sculptures_reset.rename(columns={'index':'Index'})

# Print out the first five columns of the dataset
sculptures_rename.head()

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,ThumbnailURL,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
0,73418,Surface with Vibrating Texture,Getulio Alviani,137.0,1964,Brushed aluminum on board,"33 x 32 3/4"" (83.6 x 83.2 cm)",Larry Aldrich Foundation Fund,105.1965,Sculpture,...,http://www.moma.org/media/W1siZiIsIjIwODIwOCJd...,,,,83.6,,,83.2,,
1,73474,IN RELATION TO AN INCREASE IN QUANTITY REGARDL...,Lawrence Weiner,6288.0,1973-74,LANGUAGE + THE MATERIALS REFERRED TO,Dimensions variable,Given anonymously,117.1975,Sculpture,...,http://www.moma.org/media/W1siZiIsIjMxODk1MSJd...,,,,,,,,,
2,73564,3 Standard Stoppages,Marcel Duchamp,1634.0,Paris 1913-14,"Wood box 11 1/8 x 50 7/8 x 9"" (28.2 x 129.2 x ...",,Katherine S. Dreier Bequest,149.1953.a-i,Sculpture,...,http://www.moma.org/media/W1siZiIsIjEzODY0NSJd...,,,,13.3,,,120.0,,
3,73567,To Be Looked at (from the Other Side of the Gl...,Marcel Duchamp,1634.0,Buenos Aires 1918,"Oil, silver leaf, lead wire, and magnifying le...","Overall 22"" (55.8 cm) high",Katherine S. Dreier Bequest,150.1953,Sculpture,...,http://www.moma.org/media/W1siZiIsIjI0MzI3MyJd...,,,,49.5,,,39.7,,
4,73733,Revolving,Kurt Schwitters,5293.0,1919,"Wood, metal, cord, cardboard, wool, wire, leat...","48 3/8 x 35"" (122.7 x 88.7 cm)",Advisory Committee Fund,231.1968,Sculpture,...,http://www.moma.org/media/W1siZiIsIjEyMjc3MCJd...,,,,122.7,,,88.7,,


### Concatenate DataFrames

We want to be able to work with all of the data we have imported at once, so we need to pull all three DataFrames into one DataFrame. They all have the same column format now, so we can concatenate them based on column name and order (similar to adding them together, one on top of another) using the pandas method `concat()`.

We also need to consider the current index labels for each dataset. We will create a new zero-based integer index label for the concatenated dataset by passing the keyword argument `ignore_index=True` into the `concat()` method.

In [37]:
# Concatenate all the datasets into one
moma_art = pd.concat([paintings, photos, sculptures_rename], ignore_index=True)

# Print the shape (number of rows and columns) of the full DataFrame
moma_art.shape

(35503, 25)

### Join DataFrames on shared column values

Our dataset includes a column of unique artist IDs (`ConstituentID`) that identify a specific artist. The MoMA also provides another dataset of artist information in which each row includes specific biographical information about a specific artist:

|ConstituentID  |DisplayName    |ArtistBio  |Nationality | ... |
|---|---|---|---|---|
|1  |Robert Arneson |"American, 1930–1992"  |American   | ...   |
|2  |Doroteo Arnaiz |"Spanish, born 1936"   |Spanish    | ...   |
| &#8942; | &#8942; | &#8942; | &#8942; | &#8942; |
|133026 |Alfred Tritschler  |"German, 1905 – 1970"  |German | ...   |

The artists dataset also includes a column of artist IDs in `ConstituentID`. Let's join our DataFrame of MoMA artworks (`moma_art`) with the artists dataset using the shared values of each dataset's `ConsituentID` column using the pandas method `merge()`. The resulting DataFrame will now include artist biographical information with each piece of artwork.

We first need to load the artists dataset as a DataFrame. The URL to the dataset is stored in the variable `artists_file_url`.

In [38]:
# Load artists dataset (stored in a CSV file)
artists_file_url = 'https://raw.githubusercontent.com/ncsu-libraries-data-vis/teaching-datasets/main/moma_data/moma_artists.csv'
artists = pd.read_csv(artists_file_url)

# Print the new DataFrame
artists

Unnamed: 0,ConstituentID,DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
0,1,Robert Arneson,"American, 1930–1992",American,Male,1930,1992,,
1,2,Doroteo Arnaiz,"Spanish, born 1936",Spanish,Male,1936,0,,
2,3,Bill Arnold,"American, born 1941",American,Male,1941,0,,
3,4,Charles Arnoldi,"American, born 1946",American,Male,1946,0,Q1063584,500027998.0
4,5,Per Arnoldi,"Danish, born 1941",Danish,Male,1941,0,,
...,...,...,...,...,...,...,...,...,...
15217,133006,Andrew Chesnutt,"American, 1861–1934",American,Male,1861,1934,,
15218,133007,Lewis Chesnutt,"American, 1860–1933",American,Male,1860,1933,,
15219,133026,Alfred Tritschler,"German, 1905 – 1970",German,,1905,1970,,
15220,133027,Studio of Dr. Paul Wolff & Tritschler,,,,0,0,,


![Left join visual example](./left-join.png)

We will use a "left" join to merge columns from the `artists` DataFrame into the `moma_art` DataFrame based on matching values in each DataFrame's `ConstituentID` column.

In [45]:
# Create a new DataFrame from a "left" join of the full artworks DataFrame
# and the artists DataFrame base on the shared column "ConstituentID"
moma_merge = pd.merge(moma_art, artists, how='left', on='ConstituentID')

# Print out the new merged dataset
moma_merge

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,Seat Height (cm),Duration (sec.),DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
0,32095,"Rope and People, I",Joan Miró,4016.0,"Barcelona, March 27, 1935","Oil on cardboard mounted on wood, with coil of...","41 1/4 x 29 3/8"" (104.8 x 74.6 cm)",Gift of the Pierre Matisse Gallery,71.1936,Painting,...,,,Joan Miró,"Spanish, 1893–1983",Spanish,Male,1893.0,1983.0,Q152384,500014094.0
1,33167,Fire in the Evening,Paul Klee,3130.0,1929,Oil on cardboard,"13 3/8 x 13 1/4"" (33.8 x 33.3 cm)",Mr. and Mrs. Joachim Jean Aberbach Fund,153.1970,Painting,...,,,Paul Klee,"German, born Switzerland. 1879–1940",German,Male,1879.0,1940.0,Q44007,500010493.0
2,33424,Portrait of an Equilibrist,Paul Klee,3130.0,1927,Oil and collage on cardboard over wood with pa...,"24 7/8 x 15 3/4"" (63.2 x 40 cm)",Mrs. Simon Guggenheim Fund,195.1966,Painting,...,,,Paul Klee,"German, born Switzerland. 1879–1940",German,Male,1879.0,1940.0,Q44007,500010493.0
3,34481,Guitar,Pablo Picasso,4609.0,"Paris, early 1919","Oil, charcoal and pinned paper on canvas","7' 1"" x 31"" (216 x 78.8 cm)",Gift of A. Conger Goodyear,384.1955,Painting,...,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,Q5593,500009666.0
4,35396,Grandmother,Arthur Dove,1602.0,1925,"Shingles, needlepoint, page from Concordance, ...","20 x 21 1/4"" (50.8 x 54.0 cm)",Gift of Philip L. Goodwin (by exchange),636.1939,Painting,...,,,Arthur Dove,"American, 1880–1946",American,Male,1880.0,1946.0,Q709461,500018046.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35498,136495,Disease Thrower #5,Guadalupe Maravilla,131948.0,2019,"Glass, steel, wood, cotton, plastic, loofah, p...","91 × 55 × 45"" (231.1 × 139.7 × 114.3 cm)",Fund for the Twenty-First Century,703.2019,Sculpture,...,,,Guadalupe Maravilla,"Salvadoran American, born 1976",Salvadoran,Male,1976.0,0.0,,
35499,136496,Circle Serpent,Guadalupe Maravilla,131948.0,2019,Maguey leaves,Dimensions variable,Fund for the Twenty-First Century,704.2019,Sculpture,...,,,Guadalupe Maravilla,"Salvadoran American, born 1976",Salvadoran,Male,1976.0,0.0,,
35500,136526,Candle Piece,Richard Serra,5349.0,1967,Wood and candle,"11 13/16 × 144 1/8 × 3 9/16"" (30 × 366 × 9 cm)",Gift of Peter Freeman,707.2019.a-b,Sculpture,...,,,Richard Serra,"American, born 1938",American,Male,1938.0,0.0,Q321245,500029327.0
35501,136539,Unfinished Construction. Posthumous Homage to ...,Mathias Goeritz,2203.0,c. 1953,Painted wood,"36 5/8 × 14 15/16 × 28 3/4"" (93 × 38 × 73 cm)",,TR16416,Sculpture,...,,,Mathias Goeritz,"German, 1915–1990",German,Male,1915.0,1990.0,Q64487,500023656.0


### Removing unnecessary columns

We can reduce the size of our combined dataset by removing columns that are not important for our analyses. Columns can be "dropped" from a DataFrame using the DataFrame method `drop()`.

In [46]:
# Print out the column labels for the full dataset of artworks and artist info
moma_merge.columns

Index(['Index', 'Title', 'Artist', 'ConstituentID', 'Date', 'Medium',
       'Dimensions', 'CreditLine', 'AccessionNumber', 'Classification',
       'Department', 'DateAcquired', 'Cataloged', 'ObjectID', 'URL',
       'ThumbnailURL', 'Circumference (cm)', 'Depth (cm)', 'Diameter (cm)',
       'Height (cm)', 'Length (cm)', 'Weight (kg)', 'Width (cm)',
       'Seat Height (cm)', 'Duration (sec.)', 'DisplayName', 'ArtistBio',
       'Nationality', 'Gender', 'BeginDate', 'EndDate', 'Wiki QID', 'ULAN'],
      dtype='object')

We will not be using any of the external link resources, so we can remove the columns `URL`, `ThumbnailURL`, and `Wiki QID`.

In [47]:
# Remove specified columns from the dataset using "drop()"
moma_final = moma_merge.drop(columns=['URL', 'ThumbnailURL', 'Wiki QID'])

# Print out the column labels from the new DataFrame
moma_final.columns

Index(['Index', 'Title', 'Artist', 'ConstituentID', 'Date', 'Medium',
       'Dimensions', 'CreditLine', 'AccessionNumber', 'Classification',
       'Department', 'DateAcquired', 'Cataloged', 'ObjectID',
       'Circumference (cm)', 'Depth (cm)', 'Diameter (cm)', 'Height (cm)',
       'Length (cm)', 'Weight (kg)', 'Width (cm)', 'Seat Height (cm)',
       'Duration (sec.)', 'DisplayName', 'ArtistBio', 'Nationality', 'Gender',
       'BeginDate', 'EndDate', 'ULAN'],
      dtype='object')

### Filtering rows in a DataFrame

We can filter rows of a DataFrame using conditional statements to test if values within one or more columns meet the provided criteria. This is helpful if say we want to remove unnecessary rows of data or observe a specific range of data.

Let's first demonstrate the structure of writing a conditional statement to test the values of a single column by identifying all artists that were born before the 20th century (`BeginDate < 1900`)

In [48]:
# Find which values in the "BeginDate" column are less than 1900
moma_final['BeginDate'] < 1900

0         True
1         True
2         True
3         True
4         True
         ...  
35498    False
35499    False
35500    False
35501    False
35502    False
Name: BeginDate, Length: 35503, dtype: bool

Now we can use the conditional statement above to filter our entire DataFrame to only return rows with the value `True`, in other words, rows in which the artists who created the artwork was born before 1900.

This operation is called **boolean indexing** as we are using the occurrence of `True` values from Series returned by the conditional statement to create a new DataFrame.

In [43]:
# Filter the final DataFrame to only return rows of artworks whose artist was
# born before 1900
moma_final[moma_final['BeginDate'] < 1900]

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,Width (cm),Seat Height (cm),Duration (sec.),DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,ULAN
0,32095,"Rope and People, I",Joan Miró,4016.0,"Barcelona, March 27, 1935","Oil on cardboard mounted on wood, with coil of...","41 1/4 x 29 3/8"" (104.8 x 74.6 cm)",Gift of the Pierre Matisse Gallery,71.1936,Painting,...,74.600000,,,Joan Miró,"Spanish, 1893–1983",Spanish,Male,1893.0,1983.0,500014094.0
1,33167,Fire in the Evening,Paul Klee,3130.0,1929,Oil on cardboard,"13 3/8 x 13 1/4"" (33.8 x 33.3 cm)",Mr. and Mrs. Joachim Jean Aberbach Fund,153.1970,Painting,...,33.300000,,,Paul Klee,"German, born Switzerland. 1879–1940",German,Male,1879.0,1940.0,500010493.0
2,33424,Portrait of an Equilibrist,Paul Klee,3130.0,1927,Oil and collage on cardboard over wood with pa...,"24 7/8 x 15 3/4"" (63.2 x 40 cm)",Mrs. Simon Guggenheim Fund,195.1966,Painting,...,36.800000,,,Paul Klee,"German, born Switzerland. 1879–1940",German,Male,1879.0,1940.0,500010493.0
3,34481,Guitar,Pablo Picasso,4609.0,"Paris, early 1919","Oil, charcoal and pinned paper on canvas","7' 1"" x 31"" (216 x 78.8 cm)",Gift of A. Conger Goodyear,384.1955,Painting,...,78.700000,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
4,35396,Grandmother,Arthur Dove,1602.0,1925,"Shingles, needlepoint, page from Concordance, ...","20 x 21 1/4"" (50.8 x 54.0 cm)",Gift of Philip L. Goodwin (by exchange),636.1939,Painting,...,54.000000,,,Arthur Dove,"American, 1880–1946",American,Male,1880.0,1946.0,500018046.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35220,107498,Standing Nude,Henri Matisse,3832.0,Collioure 1906 (cast 1950),Bronze,"19"" (48.3 cm) high",Gift of the Estate of Mimi and Bernie West,855.2011,Sculpture,...,0.000000,,,Henri Matisse,"French, 1869–1954",French,Male,1869.0,1954.0,500017300.0
35320,112514,Construction with Curved Forms,Joaquín Torres-García,5907.0,1931,Oil and nails on wood,"19 1/2 x 16 1/8 x 1/2"" (49.5 x 41 x 1.3 cm)",Given anonymously,1787.2012,Sculpture,...,40.957582,,,Joaquín Torres-García,"Uruguayan, 1874–1949",Uruguayan,Male,1874.0,1949.0,500031259.0
35383,121014,Seated Woman,Pablo Picasso,4609.0,"Vallauris, 1947",Glazed earthenware,"7 1/2 x 1 15/16 x 2 3/4"" (19 x 5 x 7 cm)",Gift of Almine and Bernard Ruiz-Picasso,551.2016,Sculpture,...,5.000000,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
35460,129454,Nurse Supervisor,William Edmondson,17500.0,c. 1940,Limestone,"13 1/2 × 8 1/2 × 5 1/4"" (34.3 × 21.6 × 13.3 cm)",Gift of Alice and Tom Tisch in honor of AC Hud...,557.2017,Sculpture,...,21.590043,,,William Edmondson,"American, 1874–1951",,Male,1874.0,1951.0,


In [44]:
# Filter the final DataFrame to only return rows of artworks by Pablo Picasso
moma_final[moma_final['Artist'] == 'Pablo Picasso']

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,Width (cm),Seat Height (cm),Duration (sec.),DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,ULAN
3,34481,Guitar,Pablo Picasso,4609.0,"Paris, early 1919","Oil, charcoal and pinned paper on canvas","7' 1"" x 31"" (216 x 78.8 cm)",Gift of A. Conger Goodyear,384.1955,Painting,...,78.7,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
18,73046,Girl before a Mirror,Pablo Picasso,4609.0,"Paris, March 14, 1932",Oil on canvas,"64 x 51 1/4"" (162.3 x 130.2 cm)",Gift of Mrs. Simon Guggenheim,2.1938,Painting,...,130.2,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
103,73131,Pierrot,Pablo Picasso,4609.0,"Paris, 1918",Oil on canvas,"36 1/2 x 28 3/4"" (92.7 x 73 cm)",Sam A. Lewisohn Bequest,12.1952,Painting,...,73.0,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
107,73135,Night Fishing at Antibes,Pablo Picasso,4609.0,"Antibes, August 1939",Oil on canvas,"6' 9"" x 11' 4"" (205.8 x 345.4 cm)",Mrs. Simon Guggenheim Fund,13.1952,Painting,...,345.4,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
209,73237,Woman by a Window,Pablo Picasso,4609.0,"Cannes, June 1956",Oil on canvas,"63 3/4 x 51 1/4"" (162 x 130 cm)",Mrs. Simon Guggenheim Fund,30.1957,Painting,...,130.0,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34623,75673,The Jester,Pablo Picasso,4609.0,1905 (cast 1950s),Bronze,"15 1/4 x 13 3/4 x 8 5/8"" (38.7 x 34.8 x 21.9 cm)",Louise Reinhardt Smith Bequest,789.1995,Sculpture,...,34.9,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
34752,75811,Woman's Head (Fernande),Pablo Picasso,4609.0,"Paris, fall 1909",Bronze,"16 1/4 x 9 3/4 x 10 1/2"" (41.3 x 24.7 x 26.6 cm)",Purchase,1632.1940,Sculpture,...,24.7,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
34759,75818,Plate with Still Life,Pablo Picasso,4609.0,1954,Glazed stoneware,"2 7/8 x 14 3/4 x 12 1/2"" (7.2 x 37.4 x 31.7 cm)",Gift of R. Thornton Wilson,2511.1967,Sculpture,...,37.4,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0
34779,81126,Pregnant Woman,Pablo Picasso,4609.0,"Vallauris, 1950","Plaster with metal armature, wood, ceramic ves...","43 1/4 x 8 5/8 x 12 1/2"" (110 x 22 x 32 cm)",Gift of Louise Reinhardt Smith and gift of Jac...,376.2003,Sculpture,...,22.0,,,Pablo Picasso,"Spanish, 1881–1973",Spanish,Male,1881.0,1973.0,500009666.0


Filtering on multiple conditions has a slightly different syntax than a standard Python conditional statement with logical operators (that is, conditional statements using `and`, `or`, or `not`). When filtering a DataFrame using multiple conditional statements use the operators `|` in place of `or`, `&` in place of `and`, and `~` in place of `not`. Additionally, each statement must be surrounded by parentheses to maintain correct order of operation.

Let's filter the full MoMA artworks DataFrame to return only artworks by Japanese artists (`Nationality == Japanese`) born in the 20th century or later (`BeginDate >= 1900`).

In [51]:
# Filter the final DataFrame to only return rows of artworks by Japanese artists
# born in the 20th century or later
moma_final[
    (moma_final['Nationality'] == 'Japanese')
    & (moma_final['BeginDate'] >= 1900)
]

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,Width (cm),Seat Height (cm),Duration (sec.),DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,ULAN
81,73109,B-171,Tadasky (Tadasuke Kuwayama),5776.0,1964,Synthetic polymer paint on canvas,"15 1/8 x 15 1/8"" (38.4 x 38.4 cm)",Given anonymously,8.1965,Painting,...,38.400000,,,Tadasky (Tadasuke Kuwayama),"American, born Japan 1935",Japanese,Male,1935.0,0.0,
161,73189,Kabuki,Kumi Sugaï,5719.0,1958,Oil and gilt paint on canvas,"57 1/2 x 44 5/8"" (145.8 x 113.3 cm)",Purchase,26.1959,Painting,...,113.300000,,,Kumi Sugaï,"Japanese, 1919–1996",Japanese,Male,1919.0,1996.0,500025060.0
258,73290,"DEC. 12, 1979",On Kawara,3030.0,1979,Acrylic on canvas,"18 1/4 x 24 3/8"" (46.4 x 62.5 cm), box 2 x 25 ...",Blanchette Hooker Rockefeller Fund,61.1981.1-2,Painting,...,62.500000,,,On Kawara,"Japanese, 1933–2014",Japanese,Male,1933.0,0.0,500120601.0
261,73293,"DEC. 17, 1979",On Kawara,3030.0,1979,Acrylic on canvas,"18 3/16 × 24 3/8"" (46.2 × 61.7 cm), box 2 × 25...",Blanchette Hooker Rockefeller Fund,62.1981.1-2,Painting,...,61.700000,,,On Kawara,"Japanese, 1933–2014",Japanese,Male,1933.0,0.0,500120601.0
264,73296,"DEC. 18, 1979",On Kawara,3030.0,1979,Acrylic on canvas,"18 1/4 × 24 1/4"" (46.3 × 61.6 cm), box 2 × 25 ...",Blanchette Hooker Rockefeller Fund,63.1981.1-2,Painting,...,61.600000,,,On Kawara,"Japanese, 1933–2014",Japanese,Male,1933.0,0.0,500120601.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35312,111791,Accumulation No. 1,Yayoi Kusama,3315.0,1962,"Sewn stuffed fabric, paint, and chair fringe","37 x 39 x 43"" (94 x 99.1 x 109.2 cm)",Gift of William B. Jaffe and Evelyn A. J. Hall...,1182.2012,Sculpture,...,99.060198,,,Yayoi Kusama,"Japanese, born 1929",Japanese,Female,1929.0,0.0,500122518.0
35337,113805,Voice,Katsuhiro Yamaguchi,35538.0,1962,Iron and sack cloth,"47 1/4 x 43 11/16 x 20 1/16"" (120 x 111 x 51 cm)",Gift of Ronald O. Perelman,552.2013,Sculpture,...,111.000000,,,Katsuhiro Yamaguchi,"Japanese, born 1928",Japanese,Male,1928.0,0.0,
35338,113806,Untitled,Katsuhiro Yamaguchi,35538.0,1962-63,Iron and sack cloth,"39 3/8 x 27 3/16 x 2 15/16"" (100 x 69 x 7.5 cm)",Gift of Dakis Joannou,553.2013,Sculpture,...,69.000000,,,Katsuhiro Yamaguchi,"Japanese, born 1928",Japanese,Male,1928.0,0.0,
35339,113807,Voice,Katsuhiro Yamaguchi,35538.0,1963,Iron and sack cloth,"39 9/16 x 23 5/8 x 17 1/8"" (100.5 x 60 x 43.5 cm)",Gift of James Keith Brown and Eric Diefenbach,554.2013,Sculpture,...,60.000000,,,Katsuhiro Yamaguchi,"Japanese, born 1928",Japanese,Male,1928.0,0.0,


----

## Open work time

You can use this time to ask questions, collaborate, or work on the following exercises (on your own or in a group).

For these exercises you will be using two new datasets containing data on MoMA artworks classified as audio and artworks classified as video in addition to the artists dataset already imported (the DataFrame `artists`). These audio and video artwork datasets contain the same columns as the datasets you have been working with.

Before starting the exercises you will need to load the new datasets as DataFrames. Both datasets are CSV files and the URL to each file is provided in the variables below.

In [12]:
# URLs to the audio and video datasets
audios_file_url = 'https://raw.githubusercontent.com/ncsu-libraries-data-vis/teaching-datasets/main/moma_data/moma_audios.csv'
videos_file_url = 'https://raw.githubusercontent.com/ncsu-libraries-data-vis/teaching-datasets/main/moma_data/moma_videos.csv'

# Import both files and create a DataFrame for each
moma_audios = pd.read_csv(audios_file_url)
moma_videos = pd.read_csv(videos_file_url)

### Exercise 1: Concatenate DataFrames

Combine the MoMA audio and video datasets along the rows axis (in other words, stack them on top of each other). Make sure the resulting DataFrame contains unique row index labels (*you can test this by printing the output of calling `.loc[0]` on the new DataFrame*).

In [13]:
# Concatenate the audio and video datasets using the pandas method "concat()"
audio_video = pd.concat([moma_audios, moma_videos], ignore_index=True)

# Print out the data at index label 0 from the resulting DataFrame (this should 
# only return one row of data)
audio_video.loc[0]

Index                                                       90895
Title                                          Variations (no. 2)
Artist                                               Robert Barry
ConstituentID                                                 352
Date                                                      1977-78
Medium                                                      Sound
Dimensions                                            45 min.\r\n
CreditLine                          Art & Project/Depot VBVR Gift
AccessionNumber                                          442.2007
Classification                                              Audio
Department                                  Media and Performance
DateAcquired                                           2007-09-24
Cataloged                                                       Y
ObjectID                                                   109520
URL                   http://www.moma.org/collection/works/109520
ThumbnailU

### Exercise 2: Rename column headers

To maintain consistency in the representation of measurement units of artworks change the label of the column that contains the duration of an artwork, currently labeled as `Duration (sec.)`, to `Duration (s)`.

In [14]:
# Rename the "Duration (sec.)" column label using the DataFrame method
# "rename()"
audio_video_rename = audio_video.rename(
    columns={'Duration (sec.)': 'Duration (s)'}
)

# Print out the column labels of the new DataFrame
audio_video_rename.columns

Index(['Index', 'Title', 'Artist', 'ConstituentID', 'Date', 'Medium',
       'Dimensions', 'CreditLine', 'AccessionNumber', 'Classification',
       'Department', 'DateAcquired', 'Cataloged', 'ObjectID', 'URL',
       'ThumbnailURL', 'Circumference (cm)', 'Depth (cm)', 'Diameter (cm)',
       'Height (cm)', 'Length (cm)', 'Weight (kg)', 'Width (cm)',
       'Seat Height (cm)', 'Duration (s)'],
      dtype='object')

### Exercise 3: Join DataFrames on shared column values

Add artist information, contained in the DataFrame `artists`, to the full audio and video artworks dataset. Use a join technique that maintains the original form of the audio and video DataFrame and appends columns from the `artists` DataFrame based on the shared column values in `ConstituentID` contained in each DataFrame.

In [15]:
# Merge the audio and video DataFrame with the artists DataFrame using a "left"
# join based on the shared column "ConstituentID"
audio_video_join = pd.merge(
    audio_video_rename, artists, how='left', on='ConstituentID'
)

# Print out the new DataFrame
audio_video_join

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,Seat Height (cm),Duration (s),DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
0,90895,Variations (no. 2),Robert Barry,352.0,1977-78,Sound,45 min.\r\n,Art & Project/Depot VBVR Gift,442.2007,Audio,...,,2700.0,Robert Barry,"American, born 1936",American,Male,1936.0,0.0,Q477373,500053936.0
1,96490,"Variations for Double Bass, performed by the a...",Benjamin Patterson,4520.0,1962,Audio recording,,The Gilbert and Lila Silverman Fluxus Collecti...,2665.2008,Audio,...,,,Benjamin Patterson,"American, 1934–2016",American,Male,1934.0,2016.0,Q817643,500202321.0
2,96495,Duo,Benjamin Patterson,4520.0,,Performed by the artist and William Pearson at...,,The Gilbert and Lila Silverman Fluxus Collecti...,2654.2008,Audio,...,,,Benjamin Patterson,"American, 1934–2016",American,Male,1934.0,2016.0,Q817643,500202321.0
3,96643,Requiem for Wagner the Criminal Mayor,Dick Higgins,2637.0,1962,"5 reel to reel audio tapes, 2 cassette tapes a...",,The Gilbert and Lila Silverman Fluxus Collecti...,2251.2008.1-3,Audio,...,,,Dick Higgins,"American, born England. 1938–1998",American,Male,1938.0,1998.0,Q1209700,500034921.0
4,97246,Neo Dada in the United States,George Maciunas,21398.0,1962,Audio cassette tape,,The Gilbert and Lila Silverman Fluxus Collecti...,2355.2008,Audio,...,,,George Maciunas,"American, born Lithuania. 1931–1978",American,Male,1931.0,1978.0,Q455931,500075547.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2630,136205,Workers Leaving the Googleplex,Andrew Norman Wilson,131861.0,2011,"Seven-channel and single-channel video (color,...",11:03 min. \r\n,Gift of Seth Stolbun and The Stolbun Collectio...,698.2019,Video,...,,,Andrew Norman Wilson,"American, born 1986",American,Male,1986.0,0.0,,
2631,136206,Sweet Desire a.k.a. Burial Piece (window version),Pope.L,37145.0,1996,"Video (Color, sound); 2:17 minutes",,Acquired in part through the generosity of Jil...,88.2019.7,Video,...,,,Pope.L,"American, born 1955",American,Male,1955.0,0.0,Q2547113,500332876.0
2632,136224,How to Make Money Religiously,Laure Prouvost,131903.0,2014,"Video (color, sound)",8:44 min.\r\n,Gift of Robert D. Bielecki,697.2019,Video,...,,,Laure Prouvost,"French, born 1978",French,Female,1978.0,0.0,,
2633,136264,VVEBCAM,Petra Cortright,131907.0,2007,"Webcam video (color, sound)",1:43 min.,Gift of the artist,689.2019,Video,...,,,Petra Cortright,"American, born 1986",American,Female,1986.0,0.0,,


### Exercise 4: Drop columns from DataFrame

We won't need to use the references to external content (for example, the columns labeled `URL`, `Wiki QID`, etc.). Use the list of column labels in the code cell below to remove the columns in this list from the full audio and video DataFrame.

In [16]:
# Column labels of columns to remove from the audio and video DataFrame
drop_columns = ['URL', 'ThumbnailURL', 'Wiki QID', 'ULAN']

# Drop the unnecessary columns from the DataFrame
audio_video_drop = audio_video_join.drop(columns=drop_columns)

# Print out the column labels of the new DataFrame
audio_video_drop.columns

Index(['Index', 'Title', 'Artist', 'ConstituentID', 'Date', 'Medium',
       'Dimensions', 'CreditLine', 'AccessionNumber', 'Classification',
       'Department', 'DateAcquired', 'Cataloged', 'ObjectID',
       'Circumference (cm)', 'Depth (cm)', 'Diameter (cm)', 'Height (cm)',
       'Length (cm)', 'Weight (kg)', 'Width (cm)', 'Seat Height (cm)',
       'Duration (s)', 'DisplayName', 'ArtistBio', 'Nationality', 'Gender',
       'BeginDate', 'EndDate'],
      dtype='object')

### Exercise 5: Filter DataFrame rows using single condition

Filter the full audio and video DataFrame to artworks with a duration greater than 60 seconds.

*The column `Duration (s)` contains the duration of an artwork in seconds.*

In [52]:
# Filter the full audio and video artworks DataFrame to return only artworks
# with a duration grater than 60 seconds
audio_video_drop[audio_video_drop['Duration (s)'] > 60]

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,Weight (kg),Width (cm),Seat Height (cm),Duration (s),DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate
0,90895,Variations (no. 2),Robert Barry,352.0,1977-78,Sound,45 min.\r\n,Art & Project/Depot VBVR Gift,442.2007,Audio,...,,,,2700.0,Robert Barry,"American, born 1936",American,Male,1936.0,0.0
53,106229,Meetings between Blaine and Irwin,Lynn Hershman Leeson,39696.0,1976,Audio recordings on 3 Compact Discs,"Disc 1: 21 min., 23 sec.\r\nDisc 2: 53 min., 1...",Gift of Gallery Paule Anglim,879.2011.27,Audio,...,,,,1283.0,Lynn Hershman Leeson,"American, born 1941",American,Female,1941.0,0.0
590,80919,What You Mean We?,Laurie Anderson,6807.0,1986,"Video (color, sound)",19:51 min.,Purchase,488.1986,Video,...,,,,1191.0,Laurie Anderson,"American, born 1947",American,Female,1947.0,0.0
592,80921,Three Transitions,Peter Campus,944.0,1973,"Video (color, sound)",4:53 min.,Acquired through the generosity of Barbara Pine,730.1976,Video,...,,,,293.0,Peter Campus,"American, born 1937",American,Male,1937.0,0.0
594,80924,The Loner,Tony Oursler,7489.0,1980,"Video (color, sound)",30 min.,Purchase,714.1981,Video,...,,,,1800.0,Tony Oursler,"American, born 1957",American,Male,1957.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2622,134582,Egg Eating Contest (Basement version),Pope.L,37145.0,1990,"Video (color, sound)",8:04 min.,Acquired in part through the generosity of Jil...,83.2019.1,Video,...,,,,484.0,Pope.L,"American, born 1955",American,Male,1955.0,0.0
2623,134588,Eracism (version 8b),Pope.L,37145.0,2000,"Video (color, sound)",10:24 min.,Acquired in part through the generosity of Jil...,84.2019.1,Video,...,,,,624.0,Pope.L,"American, born 1955",American,Male,1955.0,0.0
2624,134599,Member a.k.a. Schlong Journey,Pope.L,37145.0,1996,"Video (color, sound)",3:39 min.,Acquired in part through the generosity of Jil...,86.2019.1,Video,...,,,,219.0,Pope.L,"American, born 1955",American,Male,1955.0,0.0
2625,134603,Sweet Desire a.k.a. Burial Piece,Pope.L,37145.0,1996,"Video (color, sound)",4:19 min.,Acquired in part through the generosity of Jil...,88.2019.1,Video,...,,,,259.0,Pope.L,"American, born 1955",American,Male,1955.0,0.0


### Exercise 6: Filter DataFrame rows using multiple conditions

Filter the full audio and video artworks DataFrame to return only artworks that have not been cataloged and were not created by artists who identify as Male. Remember, logical operators for boolean indexing are `|` for `or`, `&` for `and`, and `~` for `not`.

*The column `Cataloged` indicates whether an artwork has or has not been cataloged using the value `Y` to indicated yes and `N` to indicate no.*

*The column `Gender` indicates the gender of an artist and includes the values `Male`, `Female`, `Non-binary`, `Non-Binary`, and `NaN`.*

In [57]:
# Filter the full audio and video DataFrame to to return only artworks that
# have not been cataloged and were not created by artists who identify as Male 
audio_video_drop[
    (audio_video_drop['Cataloged'] == 'N')
    & ~(audio_video_drop['Gender'] == 'Male')
]

Unnamed: 0,Index,Title,Artist,ConstituentID,Date,Medium,Dimensions,CreditLine,AccessionNumber,Classification,...,Weight (kg),Width (cm),Seat Height (cm),Duration (s),DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate
9,97939,Sommerfest Concert Galerie Parnass,Fluxus Collective,36649.0,"June 9, 1962",Cassette,,The Gilbert and Lila Silverman Fluxus Collecti...,FC886,Audio,...,,,,,Fluxus Collective,,,,0.0,0.0
20,98921,Spiked! The Music of Spike Jones,Spike Jones,38042.0,,cassette tape,"9/16 x 4 1/4 x 2 11/16"" (1.4 x 10.8 x 6.9 cm)",The Gilbert and Lila Silverman Fluxus Collecti...,FC1796,Audio,...,,10.8,,,Spike Jones,,,,0.0,0.0
27,100322,Sveriges Radio Program on Fluxus,Peter R. Meyer,38395.0,1986,,,The Gilbert and Lila Silverman Fluxus Collecti...,FC2058,Audio,...,,0.0,,,Peter R. Meyer,,,,0.0,0.0
29,100597,Fluxus Festival Amsterdam,,,1963,CD,,The Gilbert and Lila Silverman Fluxus Collecti...,FC2421,Audio,...,,0.0,,,,,,,,
30,100599,"Fluxus Festival Amsterdam, Part 1",,,1963,CD,,The Gilbert and Lila Silverman Fluxus Collecti...,FC2424,Audio,...,,0.0,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2509,131697,"Pyramid Club Video. February 15, 1988. King Tu...",Clayton Patterson,67801.0,1988,"Video (color, sound)",,Purchase from the Artist,820028,Video,...,,,,,Clayton Patterson,"Canadian, born 1948",Canadian,,1948.0,0.0
2510,131698,Pyramid Club Video. December 1986. King Tutt's...,Clayton Patterson,67801.0,1986,"Video (color, sound)",,Purchase from the Artist,820029,Video,...,,,,,Clayton Patterson,"Canadian, born 1948",Canadian,,1948.0,0.0
2562,134311,Sleepless Nights,Becky Johnston,75097.0,1979,Video,,Gift of the Artist and Maripol,820519,Video,...,,,,,Becky Johnston,"American, born 1956",American,Female,1956.0,0.0
2564,134329,Untitled,Trisha Donnelly,33913.0,2014,"Video (black and white, silent)","35 sec., looped",Acquired through the generosity of The David S...,691.2019,Video,...,,,,,Trisha Donnelly,"American, born 1974",American,Female,1974.0,0.0


## Further resources

### Unfilled version of this notebook

[Python Open Labs Week 2 unfilled notebook](https://colab.research.google.com/github/ncsu-libraries-data-vis/python-open-labs/blob/main/Open_Lab_2_working_with_multiple_datasets_in_pandas/Open_Lab_2_working_with_multiple_datasets_in_pandas.ipynb) - a version of this notebook without code filled in for the guided activity and exercises. Use the unfilled version to learn these materials or lead a workshop session.

### Learning resources

- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html) - a free, online version of Jake VanderPlas' introduction to data science with Python, includes a chapter on data manipulation with pandas.
- [Python Programming for Data Science](https://www.tomasbeuzen.com/python-programming-for-data-science/README.html) - a website providing a great overview of conducting data science with Python including pandas.
- [Real Python](https://realpython.com/) contains a lot of different tutorials at different levels
- [LinkedIn Learning](https://www.lynda.com/Python-training-tutorials/415-0.html) is free with NC State accounts and contains several video series for learning Python
- [Dataquest](https://www.dataquest.io/) is a free then paid series of courses with an emphasis on data science

### Finding help with pandas

The [Pandas website](https://pandas.pydata.org/) and [online documentation](http://pandas.pydata.org/pandas-docs/stable/) are useful resources, and of course the indispensible [Stack Overflow has a "pandas" tag](https://stackoverflow.com/questions/tagged/pandas).  There is also a (much younger, much smaller) [sister site dedicated to Data Science questions that has a "pandas" tag](https://datascience.stackexchange.com/questions/tagged/pandas) too.

## Evaluation Survey
Please, spend 1 minute answering these questions that help improve future workshops.

[go.ncsu.edu/dvs-eval](https://go.ncsu.edu/dvs-eval)

## Credits

This workshop was created by Claire Cahoon and Walt Gurley, adapted from previous workshop materials by Scott Bailey and Simon Wiles, of Stanford Libraries.