# Explore a DataFrame


## 🏃&nbsp;&nbsp;Import the data

If you click on the file browser icon, you can see that you have access to `event_details.csv`, a file that contains ticket sales of different events. The cell below uses `pandas` to import the data and preview it.

Go ahead and try to run the cell now to import and inspect the data!


In [8]:
# Import pandas
import pandas as pd

# Import the data as a DataFrame
event_details = pd.read_csv("event_details.csv")

# Preview the DataFrame
event_details

Unnamed: 0,event_name,category_name,category_group,city,date,month,total_sold,total_sales
0,Jersey Boys,Musicals,Shows,New York City,2008-05-22T00:00:00.000Z,5,148,106226
1,Spring Awakening,Plays,Shows,New York City,2008-09-27T00:00:00.000Z,9,145,84105
2,Spamalot,Musicals,Shows,Las Vegas,2008-09-12T00:00:00.000Z,9,142,86495
3,Chicago,Musicals,Shows,New York City,2008-07-19T00:00:00.000Z,7,137,114325
4,August: Osage County,Plays,Shows,New York City,2008-12-13T00:00:00.000Z,12,131,102767
...,...,...,...,...,...,...,...,...
995,The Seagull,Plays,Shows,New York City,2008-03-20T00:00:00.000Z,3,61,41250
996,Manhattan Transfer,Pop,Concerts,Boston,2008-09-25T00:00:00.000Z,9,61,37490
997,Voodoo Music Experience,Pop,Concerts,Phoenix,2008-08-22T00:00:00.000Z,8,61,39831
998,The Country Girl,Plays,Shows,New York City,2008-03-06T00:00:00.000Z,3,61,38963



Next, we may want to use the `.info()` method to print a summary of the DataFrame. You can find each column's name, data type, and the number of non-null rows.

In [3]:
event_details.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   event_name      1000 non-null   object
 1   category_name   1000 non-null   object
 2   category_group  1000 non-null   object
 3   city            1000 non-null   object
 4   date            1000 non-null   object
 5   month           1000 non-null   int64 
 6   total_sold      1000 non-null   int64 
 7   total_sales     1000 non-null   int64 
dtypes: int64(3), object(5)
memory usage: 62.6+ KB


We see there are no missing values in any of the eight columns, and we have three numeric variables (`month`, `total_sold`, and `total_sales`).

## 🎨&nbsp;&nbsp;Visualize the data 
An essential skill in exploratory analysis is data visualization. Let's look at the total number of tickets sold by event category. To do so, we will [group](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html) the DataFrame by the `category_name` and take the sum of all tickets sold per category.

In [10]:
# Group the DataFrame by the category_name column
category_totals = event_details.groupby("category_name", as_index=False)["total_sold"].sum()

# Sort the DataFrame by the total tickets sold
category_totals.sort_values(by="total_sold", ascending=False, inplace=True)

# Preview the new DataFrame
category_totals

Unnamed: 0,category_name,total_sold
3,Pop,38198
2,Plays,18894
0,Musicals,12199
1,Opera,3056
