# Introduction
Welcome to my Kernel! This is my first Kaggle project that I am creating for my class final project this semester. 

The dataset is about the **housing prices in King County, WA,** whose county seat is Seattle. It is a predominantly urban and suburban county. 

More information on the place : 

[https://en.wikipedia.org/wiki/King_County,_Washington](http://)

![](http://upload.wikimedia.org/wikipedia/commons/b/bc/Seattle_-_King_County_Courthouse_and_King_County_Administration_Building_01.jpg)

I have first tried to **visualize the data, clean it and ultimately build a linear regression model to analyze it**.

I have used some **python packages** to carryout the analyses and build the regression model.

This project helps to solve the problem a lot of the potential users face: with the housing market being so diverse and with so many factors influencing the prices coming into play, one can easily be overwhelmed. By the menas of this project, I want to highlight the important components of such a market in a purely unbiased,data-driven way. After looking at this project, one might get a good enough idea about the housing market in King County, WA in 2014-15.

This kernel will be extremely helpful for **potential buyers, realtors and builders** who will get more insights into what influences prices and help them take more educated data-driven decisions.

I would greatly appreciate any further comments/questions.

Since it is created for academic purposes, detailed inferences/documentation has been provided. Feel free to jump to the conclusion for a snapshot of the findings.



# 1.Data Exploration     

#  1.1.Importing the relevant modules and the dataset

I have imported relevant libraries in order to carry out the analysis.**For the final project,they have been documneted in the final project proposal.**

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

#override the matplotlib style of graphs wiht seaborn.
sns.set()

Use the pandas method to create a dataframe and print the first 5 rows of it. 

In [None]:
raw_data = pd.read_csv("../input/housesalesprediction/kc_house_data.csv")
raw_data.head()

Get more information on the type of variable.

In [None]:
raw_data.info()

Get some **descriptive statistics** like mean, standard deviation, minimum and maximum values,etc. The "include ='all'" method is used to get data on both categorical and numerical data.

#  1.2.Describing the preliminary data 

In [None]:
raw_data.describe(include='all')
#We observe no missing values at first which we confirm later. 
#There seem to be a lot of house with some that are exceptionally priced.

The following method checks for any missing values and returns their sum. It is important to get rid of missing values to create an accurate model.

In [None]:
raw_data.isnull().sum()

**The price is going to be the dependent variable ****which is influence by other variables like number of bedrooms,bathrooms, condition,grade, waterfront,etc.** 

For the final project, the variables have been explored in more detail in the documentation file supporting the project.

We drop the columns 'id' and 'date' since they don't give us much information about the price.

In [None]:
data = raw_data.drop(['id','date'],axis = 1)
data.head()

#  1.3. Correaltion Matrix

We use a **heatmap** in order to find the correaltion between the variables of this dataset. The further section,Data Visualization, uses this heatmap primarily.

In [None]:
#change the size of the figure using this matplotlib me.thod
plt.subplots(figsize=(15,10))

#plot a correalation heatmap using seaborn. Border the squares with black color, show the correaltion index and round it up.
sns.heatmap(data.corr(), annot = True,linewidths=.5,linecolor='black',fmt = '1.1f')

#give a title to the map and display it.
plt.title('correlation heatmap',size = 18)
plt.show()

# 2.Data Visualization

In this section, we surround all the visualizations and their analyses around the dependent variable, price. 

**This data being from a mix of urban and suburban locations, has a lot of outliers or exceptions. These cannot be fully explained because of the disparity in the price of houses across various neighborhoods. 
**

We can try to reason the existence of such exceptions looking at just the correaltion but we **can never fully explain the causation.**

#  2.1. Relationship between price and the number of bedrooms, bathrooms and floors:

* For a quick overview, we use the pairplot method of seaborn. It will pairwise relationships of different variable subsets of a dataset on rows and columns.
* It seems that the **number of bedrooms, bathrooms and floors have a positive correaltion with price and with each other.**
* This is pretty intuitive because the more the number of these variables, the bigger the house and the costlier it would be.
* We shouldn't jump to conclusions though since there a lot of outliers that we will deal with on a case-by-case basis.

In [None]:
#We use this in-built seaborn method to plot the specified variables and display regression lines to summarize the trends.
sns.pairplot(data,vars = ["price","bedrooms","bathrooms","floors"], kind ="reg")

# 2.2. Number of Bedrooms and Price

We obtain the unique entries in this following column.

In [None]:
#pandas mehtod to obtain the unique values of this variable to understand which values have been taken.
data['bedrooms'].unique()

Since the **no of bedrooms is not continuous, we can plot it using a boxplot.** A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.

In [None]:
plt.subplots(figsize=(12,10))

#seaborn method to plot a boxplot using the specified variables from the dataset.
sns.boxplot(x="bedrooms", y = "price",data= data)

plt.title('price vs no of bedrooms',size = 18)
plt.show()

* Most of the houses have an average of **3 to 4 bedrooms**.
* The horzintal line across the boxes denotes the median price, the lower half and the upper half represent the 25th and the 75th quartile respectively. The vertical line is the typical range and outliers above the range are represented as individual points.
* There are **a lot of outliers** interestingly. A house with no bedrooms might be a studio apartment and there is one with 33 bedrooms. Surprizingly, it is modestly priced in comparision to some other houses.
* For any given number of bedrooms, there are a lot of outliers which donn't fall under the range. This can be because c**ertain houses in the Greater Seattle Area might naturally be more expensive than a suburban house with more no of bedrooms.**
* **Thus, this is not the only variable that explains price. We have to take into account the zipcode, size and other variables as well.**

# 2.3. Number of Bathrooms and Price

In [None]:
data['bathrooms'].unique()

The number of bathrooms seems to follow a **continuous distribution.** A **histogram** would be an ideal option to denote it.[](http://)

In [None]:
plt.subplots(figsize=(12,10))

#The underscore is a dummy variable used for making it 2D.
_=plt.hist(data['bathrooms'],color='salmon')
_=plt.xlabel('no of bathrooms')
_=plt.ylabel('price')

plt.title('price vs no of bathrooms',size = 18)
plt.show()

* Again, as in the case of the number of bedrooms, the number of bathrooms and the price varies a lot. Some houses with 2 bathrooms are priced way more than some wiht 4 or 5.
* We might be tempted to think that the number of bathrooms are the price show an increasing trend. We have to keep in mind that this dataset primarily comes from an urban and a suburban region. Exceptions are therfore bound to occur.

# 2.4. Number of Floors and Price

In [None]:
data['floors'].unique()

Given the distribution of the variable, a **bargraph** would be the simplest way to visualize this variable.

In [None]:
plt.subplots(figsize=(15,10))

#seaborn method for plotting a bargraph.
sns.barplot(x="floors",y="price",data=data,palette="Blues_d")

plt.title('price vs no of floors',size = 18)
plt.show()

* As expected, although there is a postive correaltion between these two variables, **there isn't an obvious trend.**
* Penthouses and loft apartments in downtown Seattle might definitely be more expensive than a three-story suburban colonel.

# 2.5. Living Area, Waterfront, View and Price

Given the number of variables involved,a **relplot** is the best option to explore these varibles.

This function provides access to several different axes-level functions that show the relationship between two variables with semantic mappings of subsets.The relationship between x and y can be shown for different subsets of the data using the hue, size, and style parameters.

We note that the **waterfront has already been converted into a dummy variable by mapping no waterfront and a waterfront with 0 and 1 respectively.
**
THe view just tells us how good the view of the waterfront is, if any, on a scale of 0 to 4.

In [None]:

#seaborn method for a 'relpot':view acts as a further breakdown dimension. We change the look of the graph by using another color pallate.
sns.relplot(x="sqft_living",y="price",hue="waterfront",col="view",palette=["g", "r"],data=data)



* As evident from the graphs above, **a house with a good view of the waterfront definitely costs more on an average.** There is one house which doesn't have as good as a view but still costs more.
* As expected, the presence of a waterfront is not game-changing because a lot of houses with no such views still continue to priced similarly.
* But for houses with a similar living area, **a very good view of the waterfront shoots the price up **(See graph no 4 ).


# 2.6. Living Area,Lot Size, Living Area and Lot Size in the Proximity and Price

In [None]:
sns.pairplot(data, vars = ["sqft_living","sqft_lot","sqft_basement","sqft_living15","sqft_lot15"], kind ="reg")

* There is a **lot of correalation between the living area**('sqft_living') and the **other variables **like the size of the lot('sqft_lot'), basement('sqft_basement') and the living area and plot size of the nearest 15 houses ('sqft_living15' and 'sqft_lot15' respectively).
* **A large living area would definitely mean more room for basement and a larger lot**. As evident from the correlation heatmap, a larger living area is also strongly correlated with the number of bedrooms, bathrooms and floors.
* Houses that are close to each other are similar in area and similarly priced(refer the heatmap).This can be attributed to the clustering of similar houses into a neighborhood. **More affluent neighborhoods will also have similar prices.**
* There aren't as many exceptions in this case.

# 2.7. Grade and Price

This tells us how the grade, that is the quality of construction materials used might affet price.

In [None]:
data['grade'].unique()

Given the kind of distribution, using a boxplot makes sense.

In [None]:
plt.subplots(figsize=(12,10))

sns.boxplot(x="grade", y = "price",data= data,palette="Set3")

plt.title('price vs grade',size = 18)
plt.show()

* It is easy to see an obvious **upward trend** in the graph.
* Better construction materials increase and the cost of labor and raw materials which is refected by the price.
* Although we can't deny some exceptions, they definitely aren't as deviant as some of the variables that we have seen earlier.

# 2.8. Condition and Price

This variable explores how good or bad the condition of the house is on a scale of 1 to 5.

In [None]:
data['condition'].unique()

In [None]:
plt.subplots(figsize=(12,10))

sns.boxplot(x="condition", y = "price",data= data,palette="Set1")

plt.title('price vs condition',size = 18)
plt.show()

* We observe that **a better condition doesn't necessarily imply a higher price.**
* Again it would be to broad to generalize anything given the number of outliers.
* It is interesting to note that houses in a mediocre condition(grade=3) have a lot of outliers. It is an intersting point for further exploration by the concerned users.

# 2.9. Grade, Condition and Price

In [None]:
plt.subplots(figsize=(15,10))

#we use a scatterplot to analyze the relationship between price and grade and further break it down using condiiton.
sns.scatterplot(x="grade",y="price",hue="condition",size="condition",sizes=(20, 200),data=data)

plt.title('relationship between price,grade and condition',size = 18)
plt.show()

* It is pretty interesting to note that **although better grade of materials used demands a higher price, it doesn't necessarily mean that the house is any better condition.**
* On an average, **houses that were constructed using quality materials as are in a similar condition as their counterparts built with mediocre materials.**
* However is that houses in an excellent condition(condition=6) were mainly built using the best grade materials(12 or 13).
* Again, we have to keep in mind that other variables like the year in which the house was built and the fact that it was renovated or not might play a role.
* A quick look at the heatmap does indicate **a negative correalation between condition of the house and the year in which it was built.** There is however, no such correlation between condtion and renovation.
* Some other variables like maintenance and deterioration due to weather conditions might have a hand. These are out of our scope though for the given dataset.


# 2.10. Year Built

This gives us an overview of the years in which the first 50 houses were built.

In [None]:
plt.subplots(figsize=(15,15))

#this matplotlib method gives us the distribution of the counts of the first 50 observations and the year in which they were built. 
#the argument passed in displays the percentage upto the first decimal place.
data.yr_built.value_counts().head(50).plot.pie(autopct='%1.1f%%')

plt.title('year built pie chart',size = 18)
plt.show()

In [None]:
data.yr_built.value_counts()

* **A lot of houses were built in the 2000s and 2010s.**
* A quick representation of the count of unique values confirms the fact.

# 2.11. Location

We plot the latitude and longitude coordinates given in the dataset. 

The axes range from negative to positive values because of how directions are plotted on a graph.

In [None]:
plt.subplots(figsize=(12,10))

plt.scatter(data['long'],data['lat'],color="purple")

#we set the limits according to the cartographical convention.
plt.xlim(-180,180)
plt.ylim(-180,180)

plt.xlabel('longitude')
plt.ylabel('latitude')

plt.title('distribution of houses',size = 18)
plt.show()

In [None]:
#we zoom in for a better picture.

plt.subplots(figsize=(12,10))

plt.scatter(data['long'],data['lat'],color="purple")

#note that the coordinates have been selected based on the output of the previous scatterplot and hence won't be 100% accurate.
plt.xlim(-121.2,-122.6)
plt.ylim(47,47.9)

plt.xlabel('longitude')
plt.ylabel('latitude')

plt.title('distribution of houses closeup',size = 18)
plt.show()

![](http://www.seattle.gov/Images/Clerk/DistrictsMap.jpg)
* This yield a very interesting result. **Majority of the houses are located in the Greater Seattle Region.**
* More location details can be found out by plugging in the values of the longitude and the latitude in online calculators such as this one:
[https://www.latlong.net/](http://)


# 2.12.Living Area and Price

The **most important relationship** would perhaps be this one. This forms the basis of our next few sections.

In [None]:
plt.subplots(figsize=(12,10))

y=data['price']
x=data['sqft_living']

plt.scatter(x,y,color='green')

plt.title('price vs living area',size = 18)
plt.show()
plt.show()

We see an **exponential type of relationship**. We will analyze it after dealing with outliers.

# 3.Dealing with Outliers

**We draw the probability density functions of some variables to understand the concentration and distribution of the variable. We can also see the distribuiton of outliers and remove them.**

In [None]:
plt.subplots(figsize=(8,8))

#this in-built seaborn method plot the necessary graph.
sns.distplot(data['price'],color='crimson')

plt.title('pdf of price',size = 18)
plt.show()

We drop the 99th percentile since most of them are exceptions.

In [None]:
plt.subplots(figsize=(8,8))

#we create a new variable to contain the observations in the 99th percentile, that is the most dramatic outliers.
q = data['price'].quantile(0.99)

#we store it in a new data fram that contains all the observations except for the top 1 percentile. They would normally represent some luxury houses.
data_1 = data[data['price']<q]
data_1.describe(include = "all")

sns.distplot(data_1['price'],color='crimson')

plt.title('pdf of price less than the 99th percentile',size = 18)
plt.show()

While still there are many outliers, as against the previous case, they are far fewer.Getting rid of all the outliers at the same time will make our model incapable of explaining exceptions altogether and paint an inaccurate and biased image of housing prices.

We identify some variables with the most significant number of outlier from our previous analyses and drop the 99th percentile.
Every time we remove the outliers of a particular variable, we create a new dataframe with updated values.

In [None]:
plt.subplots(figsize=(8,8))

sns.distplot(data_1['bedrooms'],color='m')

plt.title('pdf of no of bedrooms',size = 18)
plt.show()

In [None]:
plt.subplots(figsize=(8,8))

p=data_1['bedrooms'].quantile(0.99)
data_2 = data_1[data_1['bedrooms']<p]

sns.distplot(data_2['bedrooms'],color='m')

plt.title('pdf of no of bedrooms less than 99th percentile',size = 18)
plt.show()

In [None]:
plt.subplots(figsize=(12,10))

sns.distplot(data_2['sqft_living'],color='pink')

plt.title('pdf of living area',size = 18)
plt.show()

In [None]:
plt.subplots(figsize=(8,8))
p = data_2['sqft_living'].quantile(0.99)

data_3 = data_2[data_2['sqft_living']<p]

sns.distplot(data_3['sqft_living'],color='pink')

plt.title('pdf of price less than 99th percentile',size = 18)
plt.show()

In [None]:
plt.subplots(figsize=(8,8))

sns.distplot(data_2['sqft_basement'],color='cyan')

plt.title('pdf of basement',size = 18)

plt.show()

In [None]:
plt.subplots(figsize=(8,8))

p = data_3['sqft_basement'].quantile(0.99)
data_4 = data_3[data_3['sqft_basement']<p]

sns.distplot(data_4['sqft_basement'],color='cyan')

plt.title('pdf of basement with 99th percentile',size = 18)
plt.show()

# 4.Regression Model

# 4.1.New DataFrame

We reset the indices of the previous dataframe in order to make the process easier.

In [None]:
data_model=data_3.reset_index(drop=True)
data_model.describe(include="all")

# 4.2.Checking the Assumptings for the Ordinary Least Squares Method(For final project, see documentation).

**We make sure that the assumptions of normality and multicollinearity aren't violated.**

# 4.2.1.Normality by Logarithmic Transformation

In our previous analysis, we couldn't spot an obvious linear relationship between the living area and price. **We convert the price using logarithms in order to linearize it.**

In [None]:
#we can directly use the numpy method to convert all the price datapoints and store it a new variable.
log_price=np.log(data_model['price'])

#we store the new variable in a new column in the existing dataframe.
data_model['log_price'] = log_price

data_model.head()

In [None]:
plt.subplots(figsize=(8,8))

y=data_model['log_price']
x=data_model['sqft_living']

plt.scatter(x,y,color='green')

plt.title('log price vs living area',size = 18)
plt.show()

As evident, the relationship is now much clearer and easier to interpret.

# 4.2.2.Multicollinearity Using Variable Influence Factor

This tells us how much the behavior of a variable is influenced by the other variables. We will use it in tandem with the heatmap to identify **multicollinearity**.

In [None]:
data_model.columns.values

In [None]:
#please note that this method has been directly used as per the statsmodels documentation. There is no inbuilt method to calculate the vif but the algorithm is cited and cab be found in the accompanying documentation.
from statsmodels.stats.outliers_influence import variance_inflation_factor

variables = data_model[[ 'bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot','floors', 'waterfront', 'view', 'condition', 'grade', 'sqft_above','sqft_basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat','long', 'sqft_living15', 'sqft_lot15']]

vif = pd.DataFrame()

vif["VIF"] = [variance_inflation_factor(variables.values, i) for i in range(variables.shape[1])]
vif["Features"] = variables.columns

In [None]:
vif

**The vif for some values is extremely high**, more some being infinity!

We recall that **some variables related to area were extremely correlated. We drop those variables.**

In [None]:
data_cleaned = data_model.drop(['sqft_living15','sqft_lot15','sqft_above','sqft_lot','sqft_basement',],axis = 1)
data_cleaned.head()

In [None]:
data_cleaned.columns.values

# 4.3.Building the Model

**We declare the independent and dependent variables.**
![](http:miro.medium.com/max/2872/1*k2bLmeYIG7z7dCyxADedhQ.png)

In [None]:
#In the above equation, x1 is x or the values taken by the x variable and y is the values taken by the y variable.

x1=data_cleaned[[ 'bedrooms', 'bathrooms', 'sqft_living',
       'floors', 'waterfront', 'view', 'condition', 'grade',
        'yr_built', 'yr_renovated', 'zipcode', 'lat',
       'long']]

y=data_cleaned[['log_price']]


# 4.4.Is the Model Significant?

**We fit the model using the appropriate Ordinary Least Squares method that comes with the statsmodels.api package.**

In [None]:
#again, this method is pre-existing and can be directly used. The citation can be found in the documentation.

#this is b0. We are essentially adding a coulmn consisting of only 1s that is equal in length to the y variable.
x= sm.add_constant(x1)

#we fit the regression model on x and y using the appropriate method and store it in a variable.
results = sm.OLS(y,x).fit()

#we summarize our findings.
results.summary()

#note that there is a variable to represent the error in the image. In statistical terms it is the SSE. In easier words, we are trying to minimize the error. The lower the error, the better our model is.

* Just creating a model is not enough. We have to gauge **the significance of the model by looking at summary table.(For the final project, see documentation for more inforamtion)**
  
  1. **R-squared**: It tells us how close the data is to the fitted line.It is **pretty close at around 74%**.
  
  2. **Adjusted R-squared:** It penalizes us for adding variables that have no explanatory power. Looks like it has **passed this test** too.
  
  3.**F-statistic:** It tells us if the F-distribution is followed. The higher the value, the better.
  
  4.**p-value:**It tells us the lowest value at which the null hypothesis can be rejected. Here the null hypothesis is that the dependent variables don't have any       explanatory power.The lowest value is 0.05. This test is also passed.

# 4.5.Regression Model 

Although our last model was pretty significant, we can still achieve a better result by trying to eliminate more correlation. On repeated trials, we can drop the following variables:

In [None]:
data_reg=data_model.drop(['long','yr_renovated'],axis=1)
data_reg.head()

In [None]:
x1=data_cleaned[[ 'bedrooms', 'bathrooms', 'sqft_living',
       'floors', 'waterfront', 'view', 'condition', 'grade',
        'yr_built', 'lat','zipcode']]
y=data_cleaned[['log_price']]

In [None]:
x= sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()

* **This model is more statistically significant because without dropping the R-squared much and keeping the adjusted R-squared the same, the F-statistic has increased significantly and the p-value is 0 now.**
* We still have a warning about strong multicollinearity. This is because certain variables like the number of bedrooms and bathrooms are although not very strongly correlated with the livng area, still have a non-negligible correaltion. Despite this, dropping them would significantly reduce the R-squared or the explanatory power.
* We have to keep in mind that **no model can fully capture the dataset**(If it does so, we might have an overfitting model).
* **We can therefore adopt the model at the accuracy level of 74%.**

# 5.Predictions

# 5.1.Predicting Prices

For the OLS Method, we had added a constant,'x' which was equal in size to all the other x1 variables.

In [None]:
#the first column displays the constant that we added earlier.
x


# 5.2.New Dataframe

We create **a new dataframe in order to predict the prices** using some other variables.

In [None]:
#we create a new dataframe with some observations.
data_with_predictions = pd.DataFrame({'const':1,'bedrooms':[3,3],'bathrooms':[1,2.25],'sqft_living':[1180,2570],'floors':[1,2],'waterfront':[0,0],'view':[0,0],'condition':[3,3], 'grade':[7,7],'yr_built':[1955,1951],'lat':[47.5112,47.7210],'zipcode':[98103,98002]})

#we name the columns and display it.
data_with_predictions=data_with_predictions[['const','bedrooms','bathrooms','sqft_living','floors','waterfront','view','condition','grade','yr_built','lat','zipcode']]

data_with_predictions

**The regression lies in the 'results' method.** We will fit the dataframe using this method and add it to our existing dataframe.

In [None]:
predictions = results.predict(data_with_predictions)
predictions

In [None]:
#we store the predictions in a new variable and attach it to the dataframe
data_with_predictions['predictions'] = predictions
data_with_predictions

# 5.3.Exponential Transformation

**We take the exponent of the prices which is still in logarithms and store it in the dataframe.**

In [None]:
#using the inbuilt numpy method we take the exponent(inverse of logarthim)of the logarithmic price to get the original prices that we are interested in.
pred_price=np.exp(data_with_predictions['predictions'])

pred_price

# 5.4.Final Prediction

In [None]:
#again we store it in a new variable and attach it to the dataset.
data_with_predictions['predicted_price']=pred_price
data_with_predictions

5.5.Remarks

* The new dataframe is actually the first two observations of the orginal dataset.
* The first prediction is within 31% of the observed value.
* The second prediction is at around 20% from the observed value.
* The** combined prediciton is within 27%** of the the observed values, our R-squared it at around 25%.

# 6.Conclusion

* We first identified the independent variables and the dependent variable, price.
* After visualizing the dataset using graphics,we inferred:
  1. Most of the **houses are located in the Seattle Metropolis.**
  
  2. **Most of the houses were built in the 21st century.**
  3. We observed that the **housing in King County is primarily influenced by the living area and the grade of the construction materials** used to build the house,followed by the number of bedrooms and the number of bathrooms.
  4. The **neighborhoods are distinctly demarcated** because of the similarity in the sizes and prices of houses in close proximity.
  5. **When it comes to prices and the number of bedrooms,bathrooms and floors, there are a lot of exceptions because of the unique locations of the houses merged into one large dataset.**
  6. **Houses built with better construction materials cost more on an average though they do not guarantee a better condition of the house.**
  7. For houses that have an excellent view of the waterfront, their prices are signifiantly higher than both those houses that totally lack a view or those with a compromised view.
* We dealt with outliers by plotting the probability density function of some variables. We also linearized the model and identified the regressors keeping in mind the assumptions of the Ordinary Least Squares Method.
* We built a **regression model using the OLS Method with around 74% of accuracy.**
* We also predicted the prices of two houses and compared it to their actual values. We found that their **combined accuracy was 73% overall.**