# Python/Pandas Assessment

### Setup
- Download a copy of this notebook. 
- Run `echo pandas_assessment.ipynb >> .gitignore` on your terminal to ensure that this assessment _does not_ get pushed to GitHub. Because sharing test questions is an academic integrity issue, we want to avoid that isssue entirely.
- Upload your completed notebook to the appropriate Google Classroom assignment.

### Orientation
- There are 10 exercises on this assessment worth 10 points each.
- Credit is given for programmatic solutions only; your code shows your work. Since you see the answer in the unit test code, if your function has `return 44`, for example, that's not going to earn credit.
- Your Python/pandas code should run without errors
- After each problem prompt, there is a cell to write your code followed by another cell with a unit test. To run the tests, uncomment those lines of code.

### Troubleshooting
If you need a fresh start, go to Kernel and then "Restart and Clear Output" in this Jupyter Notebook

In [1]:
# Required Imports and data acquisition
import pandas as pd
from pydataset import data
import numpy as np

df = data("tips")
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
1,16.99,1.01,Female,No,Sun,Dinner,2
2,10.34,1.66,Male,No,Sun,Dinner,3
3,21.01,3.5,Male,No,Sun,Dinner,3
4,23.68,3.31,Male,No,Sun,Dinner,2
5,24.59,3.61,Female,No,Sun,Dinner,4


####  EXAMPLE: Write a function named `exercise0`
- This function should accept a dataframe as its input argument
- Notice that the example function is returning the appropriate, programmatic code to obtain the solution
- The `assert` line checks the exercise solution code to ensure correctness

In [None]:
# This example function is solved below:
def exercise0(df):
    return len(df)

assert exercise0(df) == 244
print("Exercise 0 example exercise is complete.")

####  Write a function named `exercise1`
- Use the cell below to write your code
- This function should accept a dataframe as its input argument
- This function should return the highest `total_bill` value from the tips dataframe

In [None]:
# Write your code for the exercise1 function here

def exercise1(df):
    return max(df.total_bill)


In [None]:
assert exercise1(df) == 50.81
print("Exercise 1 is complete") 

####  Write a function named `exercise2`
- Use the cell below to write your code
- This function should return the smallest `tip` value from the mpg dataframe
- This function should accept a dataframe as its input argument.

In [None]:
# Write your code for the exercise2 function definition here

def exercise2(df):
    return min(df.tips)


In [None]:
assert exercise2(df) == 1.0
print("Exercise 2 is complete")

####  Write a function named `exercise3`
- Use the cell below to write your `exercise3` function definition
- This function should return the number of rows that represent "Lunch" time customers
- This function should accept a dataframe as its input argument

In [None]:
# Write your code for the exercise3 function here
def exercise3(df):
    
    return df['time'].value_counts().min()



In [None]:
assert exercise3(df) == 68
print("Exercise 3 is correct")

####  Exercise 4 is a one line of pandas code, not a function
- Use the cell below to write the code necessary to rename the `size` column to `table_size` on the `df` variable.
- Remember that `.size` is a reserved word in Pandas, so it helps to rename this columns that share a reserved word
- Exercise 4 code is not a function, but should be 1 line of pandas code. 
- Be certain to update the `df` variable or mutate it accordingly, so that `df` has the new column name.

In [None]:
# Write your pandas code to rename the "size" column to "table_size"
df.rename(columns={"size": "table_size"},inplace = True)


In [None]:
assert df.columns.tolist() == ['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'table_size']
print("Exercise 4 is complete")

#### Write a function named `exercise5`
- This function should return the percentage of lunch customers out of all customers
- You can use the full decimal or choose to round to 2 decimal places. Either answer will earn credit 
- This function should accept a dataframe as its input argument

In [None]:
# Exercise 5 code here
def Exercise5(df):
    
return(df.time == 'Lunch').mean

In [None]:
assert exercise5(df) in [0.2786885245901639, 0.28]
print("Exercise 5 is correct")

#### Exercise 6
- Write a function named `exercise6`
- This function should return the number of rows where the `total_bill` is greater than the average of all `total_bill` values.
- This function should accept a dataframe as its input argument

In [17]:
# Exercise 6 code here
def exercise6(df):
    avg =df['total_bill'].mean()
    
    

return (df.total_bill > avg_bill).sum

In [16]:
assert exercise6(df) == 99
print("Exercise 6 is correct")

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

#### Exercise 7
- Write a function named `exercise7`
- This function should return the highest `total_bill` value for Thursday dinner customers.
- This function should accept a dataframe as its input argument

In [None]:
# Exercise 7 code here
thursday = df[df.day =='Thur']
thur_dinner = thursday[thursday.time =='dinner']
return thur_dinenr.total_bill.max()

In [None]:
assert exercise7(df) == 18.78
print("Exercise 7 is correct")

#### Exercise 8
- Write a function named `exercise8`
- This function should return the highest `total_bill` for customers eating on Thursday or Friday
- This function should accept a dataframe as its input argument

In [None]:
# Exercise 8 code here
def exercise8(df):
    return df[df.day.isin(('Thur', 'Fri'))].total_bill.max()



In [None]:
assert exercise8(df) == 43.11
print("Exercise 8 is correct")

#### Exercise 9
- Write a function named `exercise9`
- This function should return the percent of `total_bill` values above the average `total_bill`
- We want to keep the decimal returned by the number of tips higher than the median divided by total number of tips.
- Avoid worrying about adding percentage symbols as strings or multiplying anything by 100. Keep it simple.
- This function should accept a dataframe as its input argument

In [None]:
# Exercise 9 code here:
def exercise9(df):
    return(df.total_bill > df.total_bill.mean()).mean

avg_bill = df.total_bill.mean()
number_above_average =(df.total_bill > avg_bill).sum()
return number_above_avg / len(df)



In [None]:
assert exercise9(df) == 0.4057377049180328
print("Exercise 9 is correct")

#### Exercise 10
- Write a function named `exercise10`
- This function should take in the `prices` series as its input argument.
- This function should clean these strings and our strings with dollar signs and commas into proper floats.
- The `exercise10` function should return a series containing only floats

In [None]:
prices = pd.Series(["$1,234.56", "$2,345,678.99", "$123.45", "$3,333,333.99"])

In [None]:
# Write your function definition for exercise10 here

def exercise10(prices):
    prices = prices.str.replace("$","")
    prices = prices.str.replace(",","")
    prices = prices.astype(float)
    return prices


In [None]:
assert exercise10(prices).values.tolist() == [1234.56, 2345678.99, 123.45, 3333333.99]
print("Exercise10 is correct.")

In [None]:
#find non smoker and dinner customer

is_nonsmoker = df.smoker == "No"
is_nonsmoker.head()
is_dinner = df.time == "dinner"
is_dinner.head()

is_non_smoker_dinner = is_non_smoker & is_dinner
is_non_smoker_dinner.head()

In [None]:
df[is_non_smoker]