<a href="https://www.kaggle.com/code/evansajumathew/sugarcane-production-country-anlaysis?scriptVersionId=143140831" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# Installing Selenium and Chromedriver_binary
!pip install selenium

In [None]:
#importing Data Processing Library and linear algebra Library
import pandas as pd
import numpy as np

#Data Visualization Library
import seaborn as sns
import plotly.express as px

# Web Scraping Library
from selenium import webdriver 
from bs4 import BeautifulSoup #Parsing HTML code adn content from webpage

import time 

In [None]:
# Initiating the chrome driver in the system ,which will will pop up as an individual chrome testing browser as
browser =  webdriver.Chrome()

# if you want to skip web scraping part, I have added scraped csv file in the current directory
# Note: Remove comment from below code and skip to 8th Input code of this file
#df = pd.read_csv('data.csv')

In [None]:
# Providing the webpage link
browser.get('https://www.atlasbig.com/en-gb/countries-by-sugarcane-production')
time.sleep(3) # Giving some time to load whole webpage 

soup = BeautifulSoup(browser.page_source,'html.parser') #Parsing the HTML code

In [None]:
 # We will be storing all those individual data's as an individual list 
country_name =[]
country_production_tons = []
production_per_person =[]
acreage =[]
production_yield =[]

# Loop for scraping main content/data
for sp in soup.find('tbody').find_all('tr'):
    
    country_name.append(sp.find_all('td')[1].text)            # country Name
    country_production_tons.append(sp.find_all('td')[2].text) # production in tonnes
    production_per_person.append(sp.find_all('td')[3].text)   # production per person in kg
    acreage.append(sp.find_all('td')[4].text)                 # acreage
    production_yield.append(sp.find_all('td')[5].text)        # production yield
    

In [None]:
#  Creating an empty dataframe
df = pd.DataFrame(columns=['country','production_(tons)','production_per_person_(kg)','acreage_(hectare)','yield_(kg_/_hectare)'])

In [None]:
# inserting values from individual columns into empty dataframe
df['country'] = country_name
df['production_(tons)'] = country_production_tons
df['production_per_person_(kg)'] =production_per_person
df['acreage_(hectare)'] = acreage
df['yield_(kg_/_hectare)'] = production_yield
df.head()

In [None]:
# Exporting the data in to CSV file named "data.csv"
df.to_csv('data.csv',index = False)

In [None]:
df.shape

<p>We have 104 rows and 5 columns, let's check the data  </p>

In [None]:
df.info()

In [None]:
df.isna().sum()

<h2>Data Cleaning </h2>
We'll start by formating the data

A) Firstly,the values will be corrected by changing the columns with string values (object) to float values.
<br> </br>

B) Secondly, we will be replacing the 'N/A' word with 'NaN' so it wont show any error during type convertion
<br> </br>

C) And finally, we will be removing comma in numerical data

for example:


<b>746,828,157  =  746828157</b>
<br> </br>

In [None]:
def number_formater(items):
    characters = [','] # to find comma in numerical data value
    item =str(items)   # Type casting
    item = item.replace('N/A','NaN')  # replacing N/A with NaN so it won't show any error when the function returns in float
    for char in characters:
        item =item.replace(char,'')   # comma replaced 
    return float(item)

In [None]:
# number_formater() function will be called
df[['production_(tons)','production_per_person_(kg)','acreage_(hectare)', 'yield_(kg_/_hectare)']]=df[['production_(tons)','production_per_person_(kg)','acreage_(hectare)', 'yield_(kg_/_hectare)']].applymap(number_formater)
df.head()

In [None]:
df.info()

Numerical Values are in float data type

In [None]:
df.isnull().sum()

<h5>
Next,We will fill the null values for 'acreage_(hectare)' and 'yield_(kg_/_hectare)' with mean value
</h5>

In [None]:
df['acreage_(hectare)'] = df['acreage_(hectare)'].fillna(df['acreage_(hectare)'].mean())
df['yield_(kg_/_hectare)'] = df['yield_(kg_/_hectare)'].fillna(df['yield_(kg_/_hectare)'].mean())
df.head()

<h1>Let's start analyzing the Data!</h1>

<h3>1). Which countries produce the most sugar cane? (Top 15 countries) </h3>

In [None]:
total_production = df['production_(tons)'].sum()
total_production

In [None]:
countries_production =df[['country','production_(tons)']].sort_values(by='production_(tons)',ascending=False).iloc[:15]
countries_production['percentage'] = ((countries_production['production_(tons)'] / total_production *100)).round(2).astype(str)+'%'
countries_production.head(5)

In [None]:
fig = px.bar(countries_production,
            x='country',
            y='production_(tons)',
            labels = {'production_(tons)':'Production (Tons)','country':'Country'},
            title = 'Top 15 countries producing Sugar Cane')
fig.show()

<div class="alert alert-block alert-info">
<b>Statement:</b> The top  2 countries producing sugar cane are Brazil and India, with a combined total of 59.19%.They make almost 60% of the world sugar cane. Most other countries that produce Sugar Cane are in Asia
</div>

<h3> 2). Do countries with a lot of land (acreage_(hectare)) produce more SugarCane? </h3>

In [None]:
fig = px.scatter(df,
            x='acreage_(hectare)',
            y='production_(tons)',
            size = 'production_(tons)',
            hover_data=['country'],
            labels = {'production_(tons)':'Production (Tons)','country':'Country'},
            title = 'Do countries with a lot of land (acreage_(hectare)) produce more SugarCane?')
fig.show()

<div class="alert alert-block alert-info">
<b>Statement :</b> Acreage (hectare) and Production (tonnes) have a positive link; nations with higher 'Acreage (hectare)' or more land tend to have higher 'Production (tonnes)'. Simply said, countries with more land produce more and have a better yield.
</div>

<h3> 3). Do countries with highly productive people produce more SugarCane? </h3>

In [None]:
fig = px.scatter(df,
                y='production_(tons)',
                x='production_per_person_(kg)',
                color ='country',
                size='production_per_person_(kg)',
                hover_data = ['country','yield_(kg_/_hectare)'],
                hover_name='country',
                labels = {'acreage_(hectare)':'Acreage (Hectare)',
                          'production_(tons)':'Production (Tons)',
                           'country':'Country'},
                title = 'Do countries with highly productive people produce more SugarCane?')
fig.show()

<div class="alert alert-block alert-info">
<b>Statement:</b>
Most countries tend to produce less than 200M tons of SugarCane evin if they have very productive workers, which suggests that other factors like the acreage_(hectare)/land may influence total production of Sugar Cane


<h3> 4). Do countries with high yields produce more Sugar Cane? </h3>

In [None]:
fig = px.scatter(df,
                y='production_(tons)',
                x='yield_(kg_/_hectare)',
                color ='country',
                size='yield_(kg_/_hectare)',
                hover_data = ['country','yield_(kg_/_hectare)'],
                hover_name='country',
                labels = {'acreage_(hectare)':'Acreage (Hectare)',
                          'yield_(kg_/_hectare)':'yield (kg/hectare)',
                          'country':'Country'},
                title = 'Do countries with high yields produce more Sugar Cane? ')
fig.show()

<div class="alert alert-block alert-info">
<b>Statement:</b>Even with high yields, most nations produce less than 200 million tonnes of sugar cane, implying that additional factors such as acreage_(hectare)/land may impact overall sugar cane output.

<h3> 5). Which country has the most productive Workers? </h3>

In [None]:
production_per_person = df[['country','production_per_person_(kg)']].sort_values(by='production_per_person_(kg)',ascending=False).iloc[:21]
production_per_person.head(5)

In [None]:
fig = px.bar(production_per_person,
             x='country',
             y='production_per_person_(kg)',
             category_orders = {'country':production_per_person['country']},
             labels = {'production_per_person_(kg)':'Production Per Person (kg)',
                      'country':'Country'},
             title = 'Which country has the most productive Workers? (Top 20 countries)'
             )
fig.show()

<div class="alert alert-block alert-info">
<b>Statement:</b>Surprisingly, Swaziland and  Belize have the most productive employees, with each generating over 4000kg of sugar cane per year. Brazil ranks third, with 3564 kg of sugar consumed per worker. India is not even in the top 20 countries.

<h3> 6). Which country has the highest yield </h3>

In [None]:
country_yield = df[['country','yield_(kg_/_hectare)']].sort_values(by='yield_(kg_/_hectare)',ascending=False).iloc[:21]
country_yield.head(5)

In [None]:
fig = px.bar(country_yield,
             x='country',
             y='yield_(kg_/_hectare)',
             category_orders = {'country':country_yield['country']},
             labels = {'yield_(kg_/_hectare)':'Yield (kg / hectare)',
                      'country':'Country'},
             title = 'Which country has the highest yield? (Top 20 countries)'
             )
fig.show()

<div class="alert alert-block alert-info">
<b>Statement:</b>The highest yields are found in Guatemala and Peru. It is worth noting that, while many African nations have great yields, they do not produce as much sugar cane as countries such as Brazil. This might be due to the fact that most African countries produce other agricultural goods such as maize or cassava.

<h3> 7.) Overall Map of countries and their production of Sugarcane </h3>

In [None]:

 fig = px.choropleth(df, locations="country",
                    color="production_(tons)",
                    locationmode='country names',
                    hover_name="country",
                    hover_data = df.columns,
                    labels = {'production_(tons)':'Production (Tons)'},
                    title = 'Map of countries and their production of Sugarcane'
                    )
fig.show()