# S1.1_Pandas 

This code demonstrates key pandas features:

- Data Loading: Reading CSV from URL with read_csv()
- Data Inspection: Using head() and info() for data overview
- Grouping: Using groupby() with aggregation (mean())
- Filtering: Boolean indexing to select specific rows
- Statistics: Built-in descriptive statistics with describe()

In [1]:
import pandas as pd
# Load dataset
url = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
iris_df = pd.read_csv(url)

In [2]:
# Basic data exploration
print("Dataset Preview:")
print(iris_df.head())  # First 5 rows
print("\nDataset Info:")
print(iris_df.info())  # Data types and non-null counts

Dataset Preview:
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
None


In [3]:
# Group by species and calculate means
species_means = iris_df.groupby('species').mean()
print("\nAverage measurements by species:")
print(species_means)


Average measurements by species:
            sepal_length  sepal_width  petal_length  petal_width
species                                                         
setosa             5.006        3.418         1.464        0.244
versicolor         5.936        2.770         4.260        1.326
virginica          6.588        2.974         5.552        2.026


In [4]:
# Basic filtering (rows where sepal_length > 5)
long_sepals = iris_df[iris_df['sepal_length'] > 5]
print("\nFlowers with sepal length > 5 (first 5 rows):")
print(long_sepals.head())


Flowers with sepal length > 5 (first 5 rows):
    sepal_length  sepal_width  petal_length  petal_width species
0            5.1          3.5           1.4          0.2  setosa
5            5.4          3.9           1.7          0.4  setosa
10           5.4          3.7           1.5          0.2  setosa
14           5.8          4.0           1.2          0.2  setosa
15           5.7          4.4           1.5          0.4  setosa


In [5]:
# Statistical summary
print("\nStatistical Summary:")
print(iris_df.describe())


Statistical Summary:
       sepal_length  sepal_width  petal_length  petal_width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.054000      3.758667     1.198667
std        0.828066     0.433594      1.764420     0.763161
min        4.300000     2.000000      1.000000     0.100000
25%        5.100000     2.800000      1.600000     0.300000
50%        5.800000     3.000000      4.350000     1.300000
75%        6.400000     3.300000      5.100000     1.800000
max        7.900000     4.400000      6.900000     2.500000
