# Analyzing a real world data-set with SQL and Python

## About Notebook
**The purpose of this notebook is to perform exploratory data analysis on real world data-set. The data set I have used in this notebook is from [kaggle](kaggle.com, "kaggle") this data set is about the growth milestone of the plants based on the provided different environmental and management factors.
You can download the data set from this https://www.kaggle.com/datasets/gorororororo23/plant-growth-data-classification link**


### Importing necessary libraries

In [None]:
import pandas as pd
import csv
import sqlite3

## Reading the csv file in the notebook

In [None]:
df = pd.read_csv('Downloads/plant_growth_data.csv')

## Here about the description of the columns
+ **Soil_Type:** The type or composition of soil in which the plants are grown.
+ **Sunlight_Hours:** The duration or intensity of sunlight exposure received by the plants.
+ **Water_Frequency:** How often the plants are watered, indicating the watering schedule.
+ **Fertilizer_Type:** The type of fertilizer used for nourishing the plants.
+ **Temperature:** The ambient temperature conditions under which the plants are grown.
+ **Humidity:** The level of moisture or humidity in the environment surrounding the plants.
+ **Growth_Milestone:** Descriptions or markers indicating stages or significant events in the growth process of the plants.

In [None]:
df

## Setting Database Engine

In [None]:
con = sqlite3.connect('classification.db')
cur = con.cursor()

## Loading SQL magic
+ As we are using sql for analysis we have to load sql magic so we can use sql queries in here

In [None]:
%load_ext sql

## Establishing a connection between sql magic and our database 

In [None]:
%sql sqlite:///classification.db

## Converting the dataframe to a table

In [None]:
df.to_sql("plant_growth_data", con, if_exists = 'replace', index = False, method = 'multi')

## Data Analysis

#### 1. Glimpse of the table

In [None]:
%sql select * from plant_growth_data limit 5;

#### 2. How many rows are in the table

In [None]:
%sql select count(*) as 'number of rows' from plant_growth_data;

#### 3. Number of plants with differe nt watering frequency that have growth milestone 1

In [None]:
%sql select Water_Frequency, count(*) as 'Number of PLants' from plant_growth_data where Growth_Milestone = 1 group by Water_frequency;

#### 4. Maximum and Minimum Humidity

In [None]:
%sql select min(Humidity) as 'Min. Hmidity', max(Humidity) as 'Max. Humidity' from plant_growth_data;

#### 5. Maximum and Minimum Temperature

In [None]:
%sql select min(Temperature) as 'Min. Temperature', max(Temperature) as 'Max. Temperature' from plant_growth_data;

#### 6. Maximum and Minimum Sunlight_Hours

In [None]:
%sql select min(Sunlight_Hours) as 'Min. Sunlight_Hours', max(Sunlight_Hours) as 'Max. Sunlight_Hours' from plant_growth_data;

#### 7. Number of cases where plants showed either growth or no growth
+  Cases with growth events are denoted by 1 and opposite with 0 

In [None]:
%sql select Growth_Milestone, count(*) as 'No. of Plants' from plant_growth_data group by Growth_Milestone;

#### 8. Number of cases where plants with different soil types that have growth milestone 1

In [None]:
%sql select Soil_Type, count(*) as 'No. of Plants' from plant_growth_data where Growth_Milestone = 1 group by Soil_Type;

#### 9. Number of cases where plants with different Fertilizer types that have growth milestone 1

In [None]:
%sql select Fertilizer_Type, count(*) as 'No. of Plants' from plant_growth_data where Growth_Milestone = 1 group by Fertilizer_Type;

## Analyzing Data by Visualizing It

### Importin Libraries for Visualisation

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

#### 1. Relationship between Sunlight Hours and Water Frequency

In [None]:
lw = %sql select Sunlight_Hours, Water_Frequency from plant_growth_data;
plot = sns.jointplot(x = 'Water_Frequency', y = 'Sunlight_Hours', data = lw.DataFrame())

+ From the plot of Sunlight_Hours and  Water_Frequency we can say that here the plants that were in sunlight for 5 to 6 hours and were watered daily are more in the data-set.

#### 2. Relatioship between growth against water frequency and sunlight hours

In [None]:
lg = %sql select Sunlight_Hours, Growth_Milestone from plant_growth_data;
plot = sns.jointplot(x = 'Growth_Milestone', y = 'Sunlight_Hours', data = lg.DataFrame())
#Water_Frequency
wg = %sql select Water_Frequency, Growth_Milestone from plant_growth_data;
plot = sns.jointplot(x = 'Growth_Milestone', y = 'Water_Frequency', data = wg.DataFrame())

+ From the plots above we can conclude that the rate of growth event happening for the plants in the dataset is almost 50% for the given water frequency and sunlight hours in the data-set.

### 3. Relationship between Soil Type and Growth Event

In [None]:
sg = %sql select Soil_Type, Growth_Milestone from plant_growth_data;
plot = sns.jointplot(x = 'Growth_Milestone', y = 'Soil_Type', data = sg.DataFrame())

+ Here the possibility of the growth event happening or not happening is almost 50%

In [None]:
#Fertilizer_Type
fg = %sql select Fertilizer_Type, Growth_Milestone from plant_growth_data;
plot = sns.jointplot(x = 'Growth_Milestone', y = 'Fertilizer_Type', data = fg.DataFrame())
#Temperature
tg = %sql select Temperature, Growth_Milestone from plant_growth_data;
plot = sns.jointplot(x = 'Growth_Milestone', y = 'Temperature', data = tg.DataFrame())

hg = %sql select Humidity, Growth_Milestone from plant_growth_data;
plot = sns.jointplot(x = 'Growth_Milestone', y = 'Humidity', data = hg.DataFrame())

+ Again we see almost 50% rate for growth event happening or notand we also found out that chances of a growth event not happening increases around 35 celcius and at humidity near 80.

## Conclusion
**From all the analysis that I did I can say that the rate of growth event happening in the data-set is nearly 50%. Although sunlight is important for plants but if we expose them to too much sunlight then it can hider the plant's growth. I also found out that high temeratures and high levels of humidity can also obstruct the growth of the plant. If we want to have a thriving plant then we should water it daily and use organic fertilizer and make sure we plant it in loam soil type**


In [None]:
con.close()

#### Author
[Tanmay](https://github.com/otanmayo "Github Tanmay")