# Pandas for dataframes

## Data description

Using the dataframe *tips.csv*, with the following variables as columns:

- **total_bill**: amount in the account (dollars)
- **tip**: amount of tips (dollars)
- **sex**: [Female, Male], gender of who pays the bill
- **smoker**: [No, Yes], if the group includes or not people who smoke
- **day**: ['Sun', 'Sat', 'Thur', 'Fri'], day of week
- **time**: [Dinner, Launch], approximate time of the day
- **size**: number of people in the group

In [61]:
# Import pandas

import pandas as pd

Import data saved as *tips.csv* as a pandas dataframe, whose name now will be tips_data.

Use the method .head() to the dataframe to see the data first 5 rows.

In [2]:
# Importar los datos
tips_data = pd.read_csv('tips.csv')
# Utilizar el método head() para ver las primeras 5 filas de los datos
tips_data.head(5)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


How many rows and columns has the DataFrame? (shape attribute)

In [4]:
tips_data.shape

(244, 7)

## Data analysis

In [5]:
def print_min_max(column):
    print("The min value for the column \"" + column + "\" is:",  tips_data[column].min())
    print("The max value for the column \"" + column + "\" is:",  tips_data[column].max())

In [6]:
print_min_max("tip")
print()
print_min_max("total_bill")
print()
print_min_max("size")

The min value for the column "tip" is: 1.0
The max value for the column "tip" is: 10.0

The min value for the column "total_bill" is: 3.07
The max value for the column "total_bill" is: 50.81

The min value for the column "size" is: 1
The max value for the column "size" is: 6


## Data filtering

### 1
<br>
To obtain all records in which the tip is equal to ten.

In [24]:
# To obtain a boolean answer whether the tip was or not equal to 10

tips_diez = tips_data.tip == 10
tips_diez.head(10)

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
9    False
Name: tip, dtype: bool

In [25]:
# To obtain the dataframe where the tip is 10

tips_diez = tips_data.tip == 10
tips_data[tips_diez]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
170,50.81,10.0,Male,Yes,Sat,Dinner,3


### 2
<br>
To obtain all records where the number of people (size) is equal to five or six.

In [42]:
# Long way

size_five = tips_data['size'] == 5
size_six = tips_data['size'] == 6
all_filters = size_five | size_six
tips_data[all_filters]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
125,29.8,4.2,Female,No,Thur,Lunch,6
141,34.3,6.7,Male,No,Thur,Lunch,6
142,41.19,5.0,Male,No,Thur,Lunch,5
143,27.05,5.0,Female,No,Thur,Lunch,6
155,29.85,5.14,Female,No,Sun,Dinner,5
156,48.17,5.0,Male,No,Sun,Dinner,6
185,20.69,5.0,Male,No,Sun,Dinner,5
187,30.46,2.0,Male,Yes,Sun,Dinner,5
216,28.15,3.0,Male,Yes,Sat,Dinner,5


In [43]:
# Short way

size_five_six = (tips_data['size'] == 5) | (tips_data['size'] == 6)
tips_data[size_five_six]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
125,29.8,4.2,Female,No,Thur,Lunch,6
141,34.3,6.7,Male,No,Thur,Lunch,6
142,41.19,5.0,Male,No,Thur,Lunch,5
143,27.05,5.0,Female,No,Thur,Lunch,6
155,29.85,5.14,Female,No,Sun,Dinner,5
156,48.17,5.0,Male,No,Sun,Dinner,6
185,20.69,5.0,Male,No,Sun,Dinner,5
187,30.46,2.0,Male,Yes,Sun,Dinner,5
216,28.15,3.0,Male,Yes,Sat,Dinner,5


### 3
<br>
To obtain all records where the party has smokers.

In [47]:
smokers = tips_data.smoker == 'Yes'
tips_data[smokers].head(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
56,38.01,3.0,Male,Yes,Sat,Dinner,4
58,11.24,1.76,Male,Yes,Sat,Dinner,2
60,20.29,3.21,Male,Yes,Sat,Dinner,2
61,13.81,2.0,Male,Yes,Sat,Dinner,2
62,11.02,1.98,Male,Yes,Sat,Dinner,2
63,18.29,3.76,Male,Yes,Sat,Dinner,4
67,3.07,1.0,Female,Yes,Sat,Dinner,1
69,15.01,2.09,Male,Yes,Sat,Dinner,2
72,26.86,3.14,Female,Yes,Sat,Dinner,2
73,25.28,5.0,Female,Yes,Sat,Dinner,2


### 4
<br>
To obtain all records where the party doesn't have smokers.

In [50]:
no_smokers = tips_data.smoker == 'No'
tips_data[no_smokers].head(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
5,25.29,4.71,Male,No,Sun,Dinner,4
6,8.77,2.0,Male,No,Sun,Dinner,2
7,26.88,3.12,Male,No,Sun,Dinner,4
8,15.04,1.96,Male,No,Sun,Dinner,2
9,14.78,3.23,Male,No,Sun,Dinner,2


### 5
<br>
To obtain all records where the tips are equal to five, seven or ten.

In [57]:
tips_five_seven_ten = (tips_data['tip'] == 5) | (tips_data['tip'] == 7) | (tips_data['tip'] == 10)
tips_data[tips_five_seven_ten]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
11,35.26,5.0,Female,No,Sun,Dinner,4
39,31.27,5.0,Male,No,Sat,Dinner,3
46,22.23,5.0,Male,No,Sun,Dinner,2
73,25.28,5.0,Female,Yes,Sat,Dinner,2
83,32.68,5.0,Male,Yes,Thur,Lunch,2
142,41.19,5.0,Male,No,Thur,Lunch,5
143,27.05,5.0,Female,No,Thur,Lunch,6
156,48.17,5.0,Male,No,Sun,Dinner,6
170,50.81,10.0,Male,Yes,Sat,Dinner,3
185,20.69,5.0,Male,No,Sun,Dinner,5


### 6
<br>
To obtain all records where the day is either saturday or sunday.

In [60]:
day_ss = (tips_data['day'] == 'Sat') | (tips_data['day'] == 'Sun')
tips_data[day_ss].head(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
5,25.29,4.71,Male,No,Sun,Dinner,4
6,8.77,2.0,Male,No,Sun,Dinner,2
7,26.88,3.12,Male,No,Sun,Dinner,4
8,15.04,1.96,Male,No,Sun,Dinner,2
9,14.78,3.23,Male,No,Sun,Dinner,2


### 7
<br>
To obtain all records of the number of people if it's greater than four and if the bill's payer is a woman.