# Introduction to Tabular Data: The Tips Dataset

Data comes in many formats and shapes. In this module we will explore tabular data using the built-in tips dataset from Seaborn. Although the tips dataset is not a time series, many of the same techniques apply when inspecting and analysing your data.

## Loading Data

One of the most popular packages in Python for working with tabular data is called Pandas. We will also use Seaborn to load the tips dataset. First, we `import` the required packages. Convention has us use the short name `pd` for pandas and `sns` for seaborn.

In [None]:
import pandas as pd
import seaborn as sns

Next we load the tips dataset using seaborn’s `load_dataset()` function. The data is returned as a DataFrame – the main data structure in pandas for working with tabular data.

In [None]:
# Load the tips dataset
df = sns.load_dataset('tips')

Before we do anything else, it's a good idea to inspect the DataFrame. We can use methods like `head()` and `tail()` to see the first and last few rows.

In [None]:
# Print the first five rows
print(df.head())

# Print the last five rows
print(df.tail())

Other methods and attributes can give us an overview of the DataFrame. For example, `shape` tells us the number of rows and columns, while `info()` provides the data types of each column.

In [None]:
# Print rows and columns
print("Rows and columns:", df.shape)

# Print summary info
print("Info")
print(df.info())

For a closer look at the data, we can use square bracket notation or `iloc` to access specific columns and rows. For instance:

In [None]:
# Access a single column (e.g. total_bill)
print(df["total_bill"])

# Access multiple columns
print(df[["total_bill", "tip"]])

# Access rows where the day is 'Fri'
print(df[df["day"] == "Fri"])

# Access the first row
print(df.iloc[0])

# Access the fifth row and third column
print(df.iloc[4, 2])

## Basic Operations

Pandas makes it easy to perform basic operations on our data. For example, we can calculate the `mean` of a column or the `max` of another. We have other aggregating methods available including `min`, `sum`, and `std` (standard deviation).

In [None]:
# Calculate the mean of 'total_bill'
print(df['total_bill'].mean())

# Find the maximum tip
print(df['tip'].max())

### Exercise 1

Print the following information, using the cell below for your code.

* The revenue generated over the period.
* How many customers visited over the period.
* The average spend per customer.

In [None]:
## YOUR CODE GOES HERE

### Exercise 2

Compare the *average* `tip` at Lunch and Dinner. Which meal time has the higher average tip? Use the cell below to show your work.

In [None]:
## YOUR CODE GOES HERE