# Analysis of prices of Valencia Areas

First we import the data using Pandas and creating a DF

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
data_file_path = "./results/all_areas.csv"

areas_valencia_df = pd.read_csv(data_file_path)

areas_valencia_df.head()

Now that we have the data, we can perform a bit of EDA

In [None]:
# Filter rows with at least one NaN
rows_with_nan = areas_valencia_df[areas_valencia_df.isna().any(axis=1)]
print(f"There are {len(rows_with_nan)} areas without data")
rows_with_nan

It seems that all areas of `POBLATS` don't have any data. So we will remove them from the analysis and state it in the conclusions

In [4]:
areas_valencia_no_nans_df = areas_valencia_df.dropna()

Now let's find some of the important statistics for hour 4 qualitative variables

In [None]:
quantitative_columns = [
    'precio_2022_euros_m2',
    'precio_2010_euros_m2',
    'max_historico_euros_m2',
]

areas_valencia_no_nans_df[quantitative_columns].describe()

Now we have some interesting data. The average price per square meter in Valencia was in 2022 around 2100€, and the median was around 2000€. The average price has increased in 12 years by 300€ which is a 14%

Now let see by neighbours how does the prices of 2022 look

In [None]:
x = areas_valencia_no_nans_df['barrio']
y = areas_valencia_no_nans_df['precio_2022_euros_m2']

plt.figure(figsize=(12, 10))
plt.bar(x, y, color='skyblue')

plt.title('Prices per Neighbourhood (€/m²)', fontsize=14)
plt.xlabel('Neighbourhood', fontsize=10)
plt.ylabel('Price 2022 (€/m²)', fontsize=10)
plt.xticks(rotation=45, ha='right')  

tick_positions = range(0, len(x), 2)
plt.xticks(tick_positions, x.iloc[tick_positions], rotation=45, ha='right')

plt.tight_layout()
plt.show()

We see a great variability (as expected), which some areas having a price of around 1000€ per square meter, while others have a price of more than 3000€ per square meter

Now let group them by distric to have a more concise idea

In [None]:
districts_valencia_no_nans_df = areas_valencia_no_nans_df.groupby('distrito')['precio_2022_euros_m2'].mean().reset_index()

x = districts_valencia_no_nans_df['distrito']
y = districts_valencia_no_nans_df['precio_2022_euros_m2']

plt.figure(figsize=(12, 10))
plt.bar(x, y, color='lightcoral')

plt.title('Prices per district (€/m²)', fontsize=14)
plt.xlabel('Districts', fontsize=10)
plt.ylabel('Price 2022 (€/m²)', fontsize=10)
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better readability

# Show the plot
plt.tight_layout()
plt.show()

Now the information is more concise and we can see that the top 3 district with the highest prices are L'EIXAMPLE, CIUTAT VELLA and EL PLA DEL REAL, whereas the three cheapest are L'OLIVERATA, JESUS and BENICALAP