### Location or Size: What Influences House Prices in Mexico?

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# Import "data/mexico-real-estate-clean.csv"
df = pd.read_csv("data/mexico-real-estate-clean.csv")

# Print object type, shape, and head
print("df type:", type(df))
print("df shape:", df.shape)
df.head()

###  Research Question 1: Which state has the most expensive real estate market?

Do housing prices vary by state? If so, which are the most expensive states for purchasing a home? During our exploratory data analysis, we used descriptive statistics like mean and median to get an idea of the "typical" house price in Mexico. Now, we need to break that calculation down by state and visualize the results.

We know in which state each house is located thanks to the "state" column. The next step is to divide our dataset into groups (one per state) and calculate the mean house price for each group.



Task 1.4.2: Use the groupby method to create a Series named mean_price_by_state, where the index contains each state in the dataset and the values correspond to the mean house price for that state. Make sure your Series is sorted from highest to lowest mean price.

In [None]:
# Declare variable `mean_price_by_state`
mean_price_by_state = df.groupby("state")["price_usd"].mean("").sort_values(ascending=False)

# Print object type, shape, and head
print("mean_price_by_state type:", type(mean_price_by_state))
print("mean_price_by_state shape:", mean_price_by_state.shape)
mean_price_by_state.head(10)

In [None]:
# Create bar chart from `mean_price_by_state` using pandas
mean_price_by_state.plot(
    kind = "bar",
    xlabel= "State",
    ylabel = "price [usd]",
    title = "Mean Price by State"
)


In [None]:
# Create "price_per_m2" column
df["price_per_m2"] = df["price_usd"] / df["area_m2"]

# Print object type, shape, and head
print("df type:", type(df))
print("df shape:", df.shape)
df.head()

In [None]:
# Group `df` by "state", create bar chart of "price_per_m2"
(df.groupby("state")
 ["price_per_m2"]
 .mean()
 .sort_values(ascending = False)
     .plot(
     kind=("bar"),
     xlabel=("State"),
     ylabel=("Mean Price per M2 [usd]"),
     title=("Mean House Price per M2 by State")
     )

)


### Research Question 2: Is there a relationship between home size and price

From our previous question, we know that the location of a home affects its price (especially if it's in Mexico City), but what about home size? Does the size of a house influence price?

A scatter plot can be helpful when evaluating the relationship between two columns because it lets you see if two variables are correlated — in this case, if an increase in home size is associated with an increase in price.

In [None]:
# Create scatter plot of "price_usd" vs "area_m2"
plt.scatter(x=df["price_usd"], y = df["area_m2"])
plt.xlabels = ("Price[usd]")
plt.ylabels = ("area[m2]")
plt.title=("Price VS Area");

# Add x-axis label


# Add y-axis label


# Add title


In [None]:
# Calculate correlation of "price_usd" and "area_m2"
p_correlation = df["area_m2"].corr(df["price_usd"])
print(p_correlation)

# Print correlation coefficient
print("Correlation of 'area_m2' and 'price_usd' (all Mexico):", p_correlation)

In [None]:
# Declare variable `df_morelos` by subsetting `df`
df_morelos = df[df["state"] == "Morelos"]

# Print object type, shape, and head
print("df_morelos type:", type(df_morelos))
print("df_morelos shape:", df_morelos.shape)
df_morelos.head()


In [None]:
# Create scatter plot of "price_usd" vs "area_m2" in Morelos
plt.scatter(x=df_morelos["area_m2"], y=df_morelos["price_usd"])
plt.xlabels=("Area[m2]")
plt.ylabels=("Price[usd]")
plt.title=("Morelos: Price Vs Area")

# Add x-axis label


# Add y-axis label


# Add title

In [None]:
# Calculate correlation of "price_usd" and "area_m2" in `df_morelos`
p_correlation = df_morelos["area_m2"].corr(df_morelos["price_usd"])

# Print correlation coefficient
print("Correlation of 'area_m2' and 'price_usd' (Morelos):", p_correlation)

In [None]:
# Declare variable `df_mexico_city` by subsetting `df`
df_mexico_city = df[df["state"] == "Distrito Federal"]

# Print object type and shape
print("df_mexico_city type:", type(df_mexico_city))
print("df_mexico_city shape:", df_mexico_city.shape)

# Create a scatter plot "price_usd" vs "area_m2" in Distrito Federal
plt.scatter(df_mexico_city["area_m2"], df_mexico_city["price_usd"])  # REMOVERHS

# Add x-axis label
plt.xlabel=("Area [sq meters]")  # REMOVERHS

# Add y-axis label
plt.ylabel=("Price [USD]")  # REMOVERHS

# Add title
plt.title=("Mexico City: Price vs. Area")  # REMOVERHS

# Calculate correlation of "price_usd" and "area_m2" in `df_mexico_city`
p_correlation = df_mexico_city["area_m2"].corr(df_mexico_city["price_usd"])

# Print correlation coefficient
print("Correlation of 'area_m2' and 'price_usd' (Mexico City):", p_correlation)