# **Description of the dataset**

The describe function in pandas is a convenient method that provides summary statistics of a DataFrame or Series. It
computes various descriptive statistics, such as count, mean, standard deviation, minimum, quartiles, and maximum, for
the numerical columns in the DataFrame.

Here is a breakdown of the statistics provided by the **.describe()** function:

1. **Count**: The number of non-null values in each column.
2. **Mean:** The average value of each column.
3. **Standard Deviation (std):** The measure of the amount of variation or dispersion in each column.
4. **Minimum (min):** The smallest value in each column.
5. **25th Percentile (25%):** Also known as the first quartile (Q1), this represents the value below which 25% of the data
falls.
6. **50th Percentile (50%):** Also known as the median, this represents the value below which 50% of the data falls.
7. **75th Percentile (75%):** Also known as the third quartile (Q3), this represents the value below which 75% of the data
falls.
8. **Maximum (max):** The largest value in each column.

In [1]:
!pip install pandas



In [2]:
import pandas as pd
df = pd.read_csv("sample_data/california_housing_test.csv")

In [3]:
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-122.05,37.37,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0
1,-118.3,34.26,43.0,1510.0,310.0,809.0,277.0,3.599,176500.0
2,-117.81,33.78,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0
3,-118.36,33.82,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0
4,-119.67,36.33,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0


In [5]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0
mean,-119.5892,35.63539,28.845333,2599.578667,529.950667,1402.798667,489.912,3.807272,205846.275
std,1.994936,2.12967,12.555396,2155.593332,415.654368,1030.543012,365.42271,1.854512,113119.68747
min,-124.18,32.56,1.0,6.0,2.0,5.0,2.0,0.4999,22500.0
25%,-121.81,33.93,18.0,1401.0,291.0,780.0,273.0,2.544,121200.0
50%,-118.485,34.27,29.0,2106.0,437.0,1155.0,409.5,3.48715,177650.0
75%,-118.02,37.69,37.0,3129.0,636.0,1742.75,597.25,4.656475,263975.0
max,-114.49,41.92,52.0,30450.0,5419.0,11935.0,4930.0,15.0001,500001.0


In [4]:
df['total_rooms'].describe()

Unnamed: 0,total_rooms
count,3000.0
mean,2599.578667
std,2155.593332
min,6.0
25%,1401.0
50%,2106.0
75%,3129.0
max,30450.0


In [6]:
df.iloc[299]

Unnamed: 0,299
longitude,-122.3
latitude,37.97
housing_median_age,34.0
total_rooms,2854.0
total_bedrooms,528.0
population,1211.0
households,452.0
median_income,3.5353
median_house_value,164700.0


In [9]:
df.sort_values(by = "total_bedrooms").head()


Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
1115,-116.95,33.86,1.0,6.0,2.0,8.0,2.0,1.625,55000.0
2640,-114.62,33.62,26.0,18.0,3.0,5.0,3.0,0.536,275000.0
740,-117.12,32.66,52.0,16.0,4.0,8.0,3.0,1.125,60000.0
1355,-117.11,32.66,52.0,25.0,5.0,14.0,9.0,1.625,118800.0
2690,-118.06,34.03,36.0,21.0,7.0,21.0,9.0,2.375,175000.0


In [10]:
df.sort_values (by = "total_bedrooms", ascending=False).head()


Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
1563,-118.44,33.98,21.0,18132.0,5419.0,7431.0,4930.0,5.3359,500001.0
2429,-117.2,33.58,2.0,30450.0,5033.0,9419.0,3197.0,4.5936,174300.0
978,-121.53,38.48,5.0,27870.0,5027.0,11935.0,4855.0,4.8811,212200.0
2014,-117.22,32.86,4.0,16289.0,4585.0,7604.0,4176.0,3.6287,280800.0
292,-116.36,33.78,6.0,24121.0,4522.0,4176.0,2221.0,3.3799,239300.0


In [11]:
df ["population" ].head(10)

Unnamed: 0,population
0,1537.0
1,809.0
2,1484.0
3,49.0
4,850.0
5,663.0
6,604.0
7,1341.0
8,1446.0
9,2830.0


In [12]:
df[df["population"] ==7431.0]


Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
1563,-118.44,33.98,21.0,18132.0,5419.0,7431.0,4930.0,5.3359,500001.0


In [14]:
after_7431 = df[ "population"] > 7431
print(after_7431)

0       False
1       False
2       False
3       False
4       False
        ...  
2995    False
2996    False
2997    False
2998    False
2999    False
Name: population, Length: 3000, dtype: bool


In [13]:
after_7431 = df[ "population"] > 7431
df [after_7431]

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
33,-118.08,34.55,5.0,16181.0,2971.0,8152.0,2651.0,4.5237,141800.0
321,-121.73,37.68,17.0,20354.0,3493.0,8768.0,3293.0,5.4496,238900.0
947,-117.23,33.91,9.0,11654.0,2100.0,7596.0,2127.0,4.0473,127200.0
978,-121.53,38.48,5.0,27870.0,5027.0,11935.0,4855.0,4.8811,212200.0
1146,-117.27,33.15,4.0,23915.0,4135.0,10877.0,3958.0,4.6357,244900.0
1283,-117.18,32.92,4.0,15025.0,2616.0,7560.0,2392.0,5.196,210700.0
1597,-117.12,33.49,4.0,21988.0,4055.0,8824.0,3252.0,3.9963,191100.0
2014,-117.22,32.86,4.0,16289.0,4585.0,7604.0,4176.0,3.6287,280800.0
2186,-116.14,34.45,12.0,8796.0,1721.0,11139.0,1680.0,2.2612,137500.0
2429,-117.2,33.58,2.0,30450.0,5033.0,9419.0,3197.0,4.5936,174300.0


In [15]:
mid_2000 = df[ "population"].between(2000, 2010)
df[mid_2000]

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
384,-122.42,37.76,52.0,2038.0,629.0,2007.0,596.0,2.5701,266700.0
421,-118.21,33.9,35.0,2420.0,579.0,2010.0,540.0,2.0817,104600.0
503,-119.8,36.82,24.0,5377.0,1005.0,2010.0,982.0,3.4542,121200.0
1268,-117.94,34.06,32.0,3418.0,662.0,2003.0,622.0,4.0333,210200.0
1428,-117.04,32.62,26.0,3620.0,607.0,2000.0,593.0,4.9962,156000.0
1669,-117.09,32.75,19.0,2739.0,707.0,2004.0,622.0,1.6318,117700.0
1828,-117.56,34.42,6.0,4264.0,749.0,2005.0,666.0,3.4695,138800.0
2244,-120.52,35.24,5.0,4413.0,804.0,2003.0,725.0,5.0267,253300.0
2256,-118.6,34.21,19.0,2581.0,857.0,2004.0,784.0,2.6159,182300.0
2297,-118.1,33.97,35.0,2426.0,529.0,2010.0,514.0,2.9922,163500.0
