## Problem statement:    
### Overview:    
There are over 4,000 agriculture markets (commonly known as mandis) in the country.    
Everyday prices fluctuate in the markets basis supply and demand of the crop.    
Prediction of crop prices is one of the most important task to ensure efficient crop planning and food safety in the country.    
The problem statement revolves around prediction of prices for the crop Potato in District “Agra” in the state of Uttar Pradesh across year 2020.    

### Data:
i. The historical data for prices in district “Agra” of state “Uttar Pradesh” are reported daily
on Agmarknet.
ii. Prices for a particular date (say 20 Mar’2021) can be extracted from a URL on Agmarknet:
https://agmarknet.gov.in/SearchCmmMkt.aspx?Tx_Commodity=24&amp;Tx_State=UP&amp;Tx_Distri
ct=1&amp;Tx_Market=0&amp;DateFrom=20-Mar-2021&amp;DateTo=20-Mar-2021&amp;Fr_Date=20-Mar-
2021&amp;To_Date=20-Mar-
2021&amp;Tx_Trend=0&amp;Tx_CommodityHead=Potato&amp;Tx_StateHead=Uttar+Pradesh&amp;Tx_District
Head=Agra&amp;Tx_MarketHead=--Select--

### Description:
Following are the tasks which need to be done:    
a. Write a python script to fetch data of prices for the year 2020 (date wise from 1 st Jan’2020 to 31 st Dec’2020) for district “Agra” of Uttar Pradesh from the data sources mentioned in the data section (can take point b as a reference). Following is the output schema expected:    
b. Identify major markets for the district “Agra” and plot price patterns for each of them. What patterns do you identify?    
c. Comment on how you can leverage machine learning to predict prices for a given market in Agra for the crop “Potato”.    
i. What are the data pre-processing / cleaning techniques you would apply?    
ii. What are the features you would use to create the model?    
iii. How would you frame this problem as a machine learning problem? What would be the target variable?    
iv. Which algorithm would you use for price prediction?    
v. What would be the loss function you would use?    
vi. Any other comments you want to add?    

### Output:
i. Please share python script to extract prices as mentioned in the pescription. Provide instructions to run the script (in a README file) and also share dependencies if any.    
ii. Collate the output of points (b) and (c), into a word document.    

Please collate all these files into a github repository and mail the access link to admin@agrilinks.in.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
from datetime import datetime
%matplotlib inline

In [None]:
from urllib.request import urlopen
from bs4 import BeautifulSoup

In [None]:
def date_generator(start_date, end_date):
    date_range = pd.date_range(start=start_date,end=end_date).to_pydatetime().tolist()
    date_range = [str(date_range[i].strftime("%d-%b-%Y")) for i in range(len(date_range))]
    return(date_range)

In [None]:
def url_gen(date_li):
    url_li = [str("https://agmarknet.gov.in/SearchCmmMkt.aspx?Tx_Commodity=24&Tx_State=UP&Tx_District=1&Tx_Market=0&DateFrom="+date+"&DateTo="+date+"&Fr_Date="+date+"&To_Date="+date+"&Tx_Trend=0&Tx_CommodityHead=Potato&Tx_StateHead=Uttar+Pradesh&Tx_DistrictHead=Agra&Tx_MarketHead=--Select--") for date in date_li]
    return(url_li)

In [None]:
def extract_date_price(url_li):
    price_list=[]
    for url in url_li:
        html = urlopen(url)
        soup = BeautifulSoup(html, 'lxml')
        rows = soup.find_all('tr')

        temp_list=[]
        temp_list_2=[]
        

        td_rows = []
        for row in rows:
            cells = row.get_text()
            td_rows.append(str(cells).strip())

        td_rows.pop(0)

        for i in range(len(td_rows)):
            td_rows[i] = td_rows[i].replace(' ', '_')
            td_rows[i] = td_rows[i].replace('\r', '')
            td_rows[i] = td_rows[i].replace('\n', ' ')

        for i in range(len(td_rows)):
            if(len(td_rows[i])>0):
                temp_list.append(' '.join(td_rows[i].split()))

        for line in temp_list:
            temp_list_2.append(line.split(' ', maxsplit=10))
        price_list.extend(temp_list_2)
    return((pd.DataFrame(data=price_list).drop(columns=[0,1])).rename(columns={2: 'District_Name', 3: 'Market_Name', 4: 'Commodity', 5: 'Variety', 6: 'Grade', 7: 'Min_Price', 8: 'Max_Price', 9: 'Modal_Price', 10: 'Price_Date'}))

In [None]:
start = str(input("Enter start date in the format YYYY-MM-DD: "))
end = str(input("Enter end date in the format YYYY-MM-DD: "))

print("The start date is "+start)
print("The end date is "+end)

Enter start date in the format YYYY-MM-DD: 2020-01-01
Enter end date in the format YYYY-MM-DD: 2020-12-31
The start date is 2020-01-01
The end date is 2020-12-31


In [None]:
date_list = date_generator(start,end)
date_list#[0:5]

['01-Jan-2020',
 '02-Jan-2020',
 '03-Jan-2020',
 '04-Jan-2020',
 '05-Jan-2020',
 '06-Jan-2020',
 '07-Jan-2020',
 '08-Jan-2020',
 '09-Jan-2020',
 '10-Jan-2020',
 '11-Jan-2020',
 '12-Jan-2020',
 '13-Jan-2020',
 '14-Jan-2020',
 '15-Jan-2020',
 '16-Jan-2020',
 '17-Jan-2020',
 '18-Jan-2020',
 '19-Jan-2020',
 '20-Jan-2020',
 '21-Jan-2020',
 '22-Jan-2020',
 '23-Jan-2020',
 '24-Jan-2020',
 '25-Jan-2020',
 '26-Jan-2020',
 '27-Jan-2020',
 '28-Jan-2020',
 '29-Jan-2020',
 '30-Jan-2020',
 '31-Jan-2020',
 '01-Feb-2020',
 '02-Feb-2020',
 '03-Feb-2020',
 '04-Feb-2020',
 '05-Feb-2020',
 '06-Feb-2020',
 '07-Feb-2020',
 '08-Feb-2020',
 '09-Feb-2020',
 '10-Feb-2020',
 '11-Feb-2020',
 '12-Feb-2020',
 '13-Feb-2020',
 '14-Feb-2020',
 '15-Feb-2020',
 '16-Feb-2020',
 '17-Feb-2020',
 '18-Feb-2020',
 '19-Feb-2020',
 '20-Feb-2020',
 '21-Feb-2020',
 '22-Feb-2020',
 '23-Feb-2020',
 '24-Feb-2020',
 '25-Feb-2020',
 '26-Feb-2020',
 '27-Feb-2020',
 '28-Feb-2020',
 '29-Feb-2020',
 '01-Mar-2020',
 '02-Mar-2020',
 '03-Mar

In [None]:
url_list = url_gen(date_list)
url_list#[0:5]

['https://agmarknet.gov.in/SearchCmmMkt.aspx?Tx_Commodity=24&Tx_State=UP&Tx_District=1&Tx_Market=0&DateFrom=01-Jan-2020&DateTo=01-Jan-2020&Fr_Date=01-Jan-2020&To_Date=01-Jan-2020&Tx_Trend=0&Tx_CommodityHead=Potato&Tx_StateHead=Uttar+Pradesh&Tx_DistrictHead=Agra&Tx_MarketHead=--Select--',
 'https://agmarknet.gov.in/SearchCmmMkt.aspx?Tx_Commodity=24&Tx_State=UP&Tx_District=1&Tx_Market=0&DateFrom=02-Jan-2020&DateTo=02-Jan-2020&Fr_Date=02-Jan-2020&To_Date=02-Jan-2020&Tx_Trend=0&Tx_CommodityHead=Potato&Tx_StateHead=Uttar+Pradesh&Tx_DistrictHead=Agra&Tx_MarketHead=--Select--',
 'https://agmarknet.gov.in/SearchCmmMkt.aspx?Tx_Commodity=24&Tx_State=UP&Tx_District=1&Tx_Market=0&DateFrom=03-Jan-2020&DateTo=03-Jan-2020&Fr_Date=03-Jan-2020&To_Date=03-Jan-2020&Tx_Trend=0&Tx_CommodityHead=Potato&Tx_StateHead=Uttar+Pradesh&Tx_DistrictHead=Agra&Tx_MarketHead=--Select--',
 'https://agmarknet.gov.in/SearchCmmMkt.aspx?Tx_Commodity=24&Tx_State=UP&Tx_District=1&Tx_Market=0&DateFrom=04-Jan-2020&DateTo=04-Jan

In [None]:
date_price_list = extract_date_price(url_list)
date_price_list

Unnamed: 0,District_Name,Market_Name,Commodity,Variety,Grade,Min_Price,Max_Price,Modal_Price,Price_Date
0,Agra,Achnera,Potato,Desi,FAQ,1300,1400,1350,01_Jan_2020
1,Agra,Fatehpur_Sikri,Potato,Local,FAQ,1400,1520,1455,01_Jan_2020
2,Agra,Jagnair,Potato,Desi,FAQ,1250,1350,1300,01_Jan_2020
3,Agra,Jarar,Potato,Desi,FAQ,1200,1300,1250,01_Jan_2020
4,Agra,Khairagarh,Potato,Desi,FAQ,1200,1300,1250,01_Jan_2020
...,...,...,...,...,...,...,...,...,...
1823,Agra,Agra,Potato,Desi,FAQ,800,1100,960,31_Dec_2020
1824,Agra,Fatehabad,Potato,Desi,FAQ,700,800,750,31_Dec_2020
1825,Agra,Fatehpur_Sikri,Potato,Local,FAQ,900,1100,1015,31_Dec_2020
1826,Agra,Jagnair,Potato,Desi,FAQ,750,850,800,31_Dec_2020


Unnamed: 0,District_Name,Market_Name,Commodity,Variety,Grade,Min_Price,Max_Price,Modal_Price,Price_Date
0,Agra,Achnera,Potato,Desi,FAQ,1300,1400,1350,01_Jan_2020
1,Agra,Fatehpur_Sikri,Potato,Local,FAQ,1400,1520,1455,01_Jan_2020
2,Agra,Jagnair,Potato,Desi,FAQ,1250,1350,1300,01_Jan_2020
3,Agra,Jarar,Potato,Desi,FAQ,1200,1300,1250,01_Jan_2020
4,Agra,Khairagarh,Potato,Desi,FAQ,1200,1300,1250,01_Jan_2020
...,...,...,...,...,...,...,...,...,...
198,Agra,Achnera,Potato,Desi,FAQ,1100,1180,1140,31_Jan_2020
199,Agra,Agra,Potato,Desi,FAQ,900,1060,1000,31_Jan_2020
200,Agra,Jagnair,Potato,Desi,FAQ,1200,1250,1225,31_Jan_2020
201,Agra,Jarar,Potato,Desi,FAQ,760,840,800,31_Jan_2020
