Project Part 3
===============

In [None]:
import pandas as pd
import xlwings as xw
import matplotlib.pyplot as plt
from statsmodels.formula.api import ols
import StatTests
import numpy as np
from scipy import stats 

In [None]:
df1 = pd.read_csv('Seller.csv')
df2 = pd.read_csv('Vehicle.csv')
df = pd.merge(df1, df2, on='unique_motorcycle_id')
df

Report Section
-------

I want to see how certain factors, like kilometers driven or the type of seller, affect showroom and selling prices of motorcycles. I'd also like to see which types of previous owners had motorcycles with the most kilometers per hour. To do this, we will need to perform one regression, one t-test, and one ANOVA. 

### Q1: Does the amount of kilometers on the motorcycle have an affect on its showroom price?

In [None]:
model = ols('km_driven ~ ex_showroom_price',data=df).fit()
print(model.summary())

#### Analysis and Conclusion

There is a 4% chance that the Kilometers Driven have no effect on the Showroom Price. We can also see that since 'ex_showroom_price' has a negative coeficient, as the Kilometers Driven increase, the Showroom Price decreases. 

### Q2: Is the average number of Kilometers Driven the same between the different owner groups?

In [None]:
firstOwner = df[df['owner'] == '1st owner']
secondOwner = df[df['owner'] == '2nd owner']
thirdOwner = df[df['owner'] == '3rd owner']
alpha = 0.05
f, p_val = stats.f_oneway(firstOwner['km_driven'],secondOwner['km_driven'],thirdOwner['km_driven'])
print("This is a test of equal means with Ho: The means of all groups are equal/Ha: At least one group mean is different")
print(f"The F test statistic is {round(f,3)} and the p-value is {round(p_val,4)}")
if p_val < alpha:
    print("Conclusion: Reject Ho: At least one group mean is different")
    ANOVA_type = "ANOVA: At least one group mean is different"
else:
    print("Conclusion: Fail to Reject Ho: We can't reject that the means are the same")
    ANOVA_type = "Anova: Group means are the same"
print("\n")
print("From the anova test, we can state that we cannot see a difference between average selling price by previous owner type.")

### Q3: Is the average Price the same between the different seller types?

In [None]:
StatTests.MeansTest(df,'seller_type','selling_price')
print("From the t test, we can state that we cannot see a difference between average selling price by the type of seller.")

### Overall Conclusion

We've concluded that kilometers driven most likely has an effect on the showroom price of a motorcycle. The means of kilometers driven also varies between the different types of previous owners (1st, 2nd, or 3rd), and the mean selling price varies with the type of seller (individual or dealership).

Menu Section
------

In [162]:
def runMenu(df):
    quit = False
    while quit == False: 
        print("\nMENU")
        print("1. Find Maximum Value")
        print("2. Find Minimum Value")
        print("3. Find Average Value")
        print("4. Open DataFrame in Excel")
        print("5. Regression for Showroom Price by Kilometers Driven")
        print("6. ANOVA Test: Kilometers Driven by Previous Owner")
        print("7. T test: Price by Seller Type")
        print("8. Scatter Plot showing the Kilometers Driven by Year")
        print("9. Quit")
        menu_choice = input("Select an option: ")
        try:
            menu_choice = int(menu_choice) 
        except:
            print("ERROR: Please enter a number between 1 and 9.")
        if menu_choice not in [1,2,3,4,5,6,7,8,9]:
            print("ERROR: Please enter a number between 1 and 9.")
        else:
            if menu_choice == 1:
                print("\n")
                print("\nMENU")
                print(list(df.select_dtypes('number').columns))
                hist_col = input("Select an option: ")
                print(df[hist_col].max())
                continue
            elif menu_choice == 2:
                print("\n")
                print("\nMENU")
                print(list(df.select_dtypes('number').columns))
                hist_col = input("Select an option: ")
                print(df[hist_col].min())
                continue
            elif menu_choice == 3:
                print("\n")
                print("\nMENU")
                print(list(df.select_dtypes('number').columns))
                hist_col = input("Select an option: ")
                print(df[hist_col].mean())
                continue
            elif menu_choice == 4:
                print("\n")
                df.to_excel(r'PYProjectPart3Data.xlsx')
                wb2 = xw.Book(r'PYProjectPart3Data.xlsx')
            elif menu_choice == 5:
                print("\n")
                model = ols('km_driven ~ ex_showroom_price',data=df).fit()
                print(model.summary())
            elif menu_choice == 6:
                print("\n")
                firstOwner = df[df['owner'] == '1st owner']
                secondOwner = df[df['owner'] == '2nd owner']
                thirdOwner = df[df['owner'] == '3rd owner']
                alpha = 0.05
                f, p_val = stats.f_oneway(firstOwner['km_driven'],secondOwner['km_driven'],thirdOwner['km_driven'])
                print("This is a test of equal means with Ho: The means of all groups are equal/Ha: At least one group mean is different")
                print(f"The F test statistic is {round(f,3)} and the p-value is {round(p_val,4)}")
                if p_val < alpha:
                    print("Conclusion: Reject Ho: At least one group mean is different")
                    ANOVA_type = "ANOVA: At least one group mean is different"
                else:
                    print("Conclusion: Fail to Reject Ho: We can't reject that the means are the same")
                    ANOVA_type = "Anova: Group means are the same"
            elif menu_choice == 7:
                print("\n")
                StatTests.MeansTest(df,'seller_type','selling_price')
            elif menu_choice == 8:
                print("\n")
                plt.scatter(np.log(df['km_driven']),np.log(df['selling_price']))
                plt.show()
                continue
            elif menu_choice == 9:
                quit = True

In [163]:
runMenu(df)


MENU
1. Find Maximum Value
2. Find Minimum Value
3. Find Average Value
4. Open DataFrame in Excel
5. Regression for Showroom Price by Kilometers Driven
6. ANOVA Test: Kilometers Driven by Previous Owner
7. T test: Price by Seller Type
8. Scatter Plot showing the Kilometers Driven by Year
9. Quit


Select an option:  9
