# Capstone Project - The Battle of Neighborhoods (Week 1)

### **Question 1:** 
   Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.
   
### **Answer 1:** 
   **To provide the necessary information visualization to companies or different crop processing factories, I'll analyse the crop production datas of india in different region with Foursquare API.**


.

### **Question 2:**
   Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.
   
### **Answer 2:**
   **For this project I will use Kaggle crop production dataset(https://www.kaggle.com/divyosmi2009/crop-production-in-india-statevise?select=crop_production.csv) which contains more than 2,00,000 data rows with the detailed information of crop production in India with respect to defferent region and year(**
   
| Year |%Data|
| ---- |----:|
| 1997 | Min |
| 2002 | 25% |
| 2006 | 50% |
| 2010 | 75% |
| 2015 | Max |

**) in combination with Forsquare location data.**


In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

#!conda install -c conda-forge beautifulsoup4 --yes
from bs4 import BeautifulSoup


import pandas as pd
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

In [2]:
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

### Reading Raw data from downloaded csv file

In [3]:
df = pd.read_csv('crop_production.csv')
sh = df.shape
print('This DataFrame Contains {} rows and {} columns.'.format(sh[0], sh[1]))
df.head()

This DataFrame Contains 246091 rows and 7 columns.


Unnamed: 0,State_Name,District_Name,Crop_Year,Season,Crop,Area,Production
0,Andaman and Nicobar Islands,NICOBARS,2000,Kharif,Arecanut,1254.0,2000.0
1,Andaman and Nicobar Islands,NICOBARS,2000,Kharif,Other Kharif pulses,2.0,1.0
2,Andaman and Nicobar Islands,NICOBARS,2000,Kharif,Rice,102.0,321.0
3,Andaman and Nicobar Islands,NICOBARS,2000,Whole Year,Banana,176.0,641.0
4,Andaman and Nicobar Islands,NICOBARS,2000,Whole Year,Cashewnut,720.0,165.0


In [4]:
opt_crop = sorted([i for i in set(df['Crop'])])
print('Please select as few as possible(Maximum 5) to get faster and perfect view!')
x = widgets.SelectMultiple(
    options=opt_crop,
    value=['Rice', 'Wheat'],
    rows=10,
    description='Select Crop: ',
    disabled=False,
)
display(x)

Please select as few as possible(Maximum 5) to get faster and perfect view!


SelectMultiple(description='Select Crop: ', index=(95, 119), options=('Apple', 'Arcanut (Processed)', 'Arecanu…

In [5]:
crops = x.value
print('Your Selections: ', crops)

Your Selections:  ('Rice', 'Wheat')


In [6]:
opt_year = [i for i in range(1997, 2016)]
print('Please select as few as possible to get faster view!')
y = widgets.SelectMultiple(
    options = opt_year,
    value=[2011, 2012, 2013, 2014, 2015],
    rows=10,
    description='Select Year: ',
    disabled=False,
)
display(y)

Please select as few as possible to get faster view!


SelectMultiple(description='Select Year: ', index=(14, 15, 16, 17, 18), options=(1997, 1998, 1999, 2000, 2001,…

In [7]:
years = y.value
print('Your Selections: ', years)

Your Selections:  (2011, 2012, 2013, 2014, 2015)


In [8]:
opt_ses = [i for i in set(df['Season'])]
print('Please select as few as possible to get faster view!')
z = widgets.SelectMultiple(
    options = opt_ses,
    value=['Summer     ', 'Winter     ', 'Whole Year ', 'Rabi       ', 'Autumn     ', 'Kharif     '],
    rows=6,
    description='Select Season: ',
    disabled=False,
)
display(z)

Please select as few as possible to get faster view!


SelectMultiple(description='Select Season: ', index=(0, 1, 2, 3, 4, 5), options=('Summer     ', 'Winter     ',…

In [9]:
seasons = z.value
print('Your Selections: ', seasons)

Your Selections:  ('Summer     ', 'Winter     ', 'Whole Year ', 'Rabi       ', 'Autumn     ', 'Kharif     ')


### Creating required dataframe

In [10]:
#i for i in crops]+
df2 = pd.DataFrame(columns=['State_Name', 'District_Name', 'Latitude', 'Longitude']
                   +[i for i in years]
                   +[i for i in seasons]
                   +['Crop', 'Production/Area'])
df2

Unnamed: 0,State_Name,District_Name,Latitude,Longitude,2011,2012,2013,2014,2015,Summer,Winter,Whole Year,Rabi,Autumn,Kharif,Crop,Production/Area


In [11]:
#define function to retrive location data of a given dist, state.
def get_loc(dist, state):
    address = '{}, {}'.format(dist, state)
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    print(location)
    if location is None:
        locationd = geolocator.geocode(dist)
        locations = geolocator.geocode(state)
        if locationd is None:
            location = locations
        elif (abs(locations.latitude-locationd.latitude)>10) or (abs(locations.longitude-locationd.longitude)>10):
            location = locations
        else:
            location = locationd

    return location 


### Filling Required Dataframe

In [12]:
dist = df['District_Name'][0]
state = df['State_Name'][0]
location = get_loc(dist, state)
lat = location.latitude
long = location.longitude

for tup in list(zip(df['District_Name'], 
                    df['State_Name'], 
                    df['Crop_Year'], 
                    df['Season'], 
                    df['Crop'], 
                    df['Area'], 
                    df['Production'])):
    
    if tup[0] != dist:
        location = get_loc(tup[0], tup[1])
        lat = location.latitude
        long = location.longitude
                
    dist = tup[0]
    state = tup[1]
        
    if (tup[2] in years) and (tup[3] in seasons) and (tup[4] in crops):
        '''try:
            res = tup[6]#/tup[5]
        except:
            res = 0'''
                
        df2 = df2.append({'State_Name': state, 
                          'District_Name': dist, 
                          'Latitude': lat, 
                          'Longitude': long, 
                          tup[2]: True, 
                          tup[3]: True, 
                          'Crop': tup[4], 
                          'Production/Area': tup[6]}, ignore_index=True)

df2

None
North and Middle Andaman, Andaman and Nicobar Islands, 744210, India
None
Anantapur, Andhra Pradesh, India
Chittoor, Andhra Pradesh, 517001, India
East Godavari, Andhra Pradesh, India
Guntur, Andhra Pradesh, 522001, India
Kadapa, YSR, Andhra Pradesh, 516001, India
Krishna, Andhra Pradesh, India
Kurnool, Andhra Pradesh, 518001, India
Prakasam, Andhra Pradesh, India
Sri Potti Sriramulu Nellore, Andhra Pradesh, India
Srikakulam, Andhra Pradesh, India
None
Vizianagaram, Andhra Pradesh, India
West Godavari, Andhra Pradesh, India
Anjaw, Arunachal Pradesh, 792104, India
Changlang, Changlang HQ, Changlang, Arunachal Pradesh, India
Upper Dibang Valley, Arunachal Pradesh, 792100, India
East Kameng, Arunachal Pradesh, India
East Siang, Upper Siang, Arunachal Pradesh, India
Kurung Kumey district, Arunachal Pradesh, India
Lohit, Arunachal Pradesh, India
Longding, Arunachal Pradesh, India
Lower Dibang Valley, Arunachal Pradesh, India
Lower Subansiri, Kra Daadi, Arunachal Pradesh, India
Namsai, 

Unnamed: 0,State_Name,District_Name,Latitude,Longitude,2011,2012,2013,2014,2015,Summer,Winter,Whole Year,Rabi,Autumn,Kharif,Crop,Production/Area
0,Andhra Pradesh,ANANTAPUR,14.654623,77.556260,True,,,,,,,,,,True,Rice,94531.0
1,Andhra Pradesh,ANANTAPUR,14.654623,77.556260,True,,,,,,,,1.0,,,Rice,25542.0
2,Andhra Pradesh,ANANTAPUR,14.654623,77.556260,True,,,,,,,,1.0,,,Wheat,100.0
3,Andhra Pradesh,ANANTAPUR,14.654623,77.556260,,1.0,,,,,,,,,True,Rice,58691.0
4,Andhra Pradesh,ANANTAPUR,14.654623,77.556260,,1.0,,,,,,,1.0,,,Rice,17768.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4904,West Bengal,PURULIA,23.296146,86.342108,,,1.0,,,,1.0,,,,,Rice,730136.0
4905,West Bengal,PURULIA,23.296146,86.342108,,,,1.0,,,,,,1.0,,Rice,721.0
4906,West Bengal,PURULIA,23.296146,86.342108,,,,1.0,,,,,1.0,,,Wheat,3663.0
4907,West Bengal,PURULIA,23.296146,86.342108,,,,1.0,,1.0,,,,,,Rice,801.0


In [13]:
df2.head()

Unnamed: 0,State_Name,District_Name,Latitude,Longitude,2011,2012,2013,2014,2015,Summer,Winter,Whole Year,Rabi,Autumn,Kharif,Crop,Production/Area
0,Andhra Pradesh,ANANTAPUR,14.654623,77.55626,True,,,,,,,,,,True,Rice,94531.0
1,Andhra Pradesh,ANANTAPUR,14.654623,77.55626,True,,,,,,,,1.0,,,Rice,25542.0
2,Andhra Pradesh,ANANTAPUR,14.654623,77.55626,True,,,,,,,,1.0,,,Wheat,100.0
3,Andhra Pradesh,ANANTAPUR,14.654623,77.55626,,1.0,,,,,,,,,True,Rice,58691.0
4,Andhra Pradesh,ANANTAPUR,14.654623,77.55626,,1.0,,,,,,,1.0,,,Rice,17768.0


In [14]:
df2.shape

(4909, 17)

### Creating map with Folium

In [15]:
address = 'India'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of India are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of India are 22.3511148, 78.6677428.


In [16]:
mean = np.mean(df2['Production/Area'])
mean

145458.44192810458

In [17]:
def color_selection(cr):
    try:
        if cr == crops[0]:
            return ['blue', '#5dade2']
        
        if cr == crops[1]:
            return ['white', '#fdfefe']
                
        if cr == crops[2]:
            return ['black', '#566573']
            
        if cr == crops[3]:
            return ['green', '#82e0aa']
        
        if cr == crops[4]:
            return ['yellow', '#f9e79f']
                
    except:
        return ['red', '#f5b7b1']

In [None]:
for i in years:
    f = folium.Figure(width=400, height=400)
    # create map of Toronto using latitude and longitude values
    map_india = folium.Map(location=[latitude, longitude], 
                           zoom_start=4, 
                           max_bounds=True) 
                           #tiles='Stamen Terrain')

    # add markers to map
    for lat, lng, state, dist, st, cr, pr in zip(df2['Latitude'], 
                                                 df2['Longitude'], 
                                                 df2['State_Name'], 
                                                 df2['District_Name'], 
                                                 df2[i], df2['Crop'], 
                                                 df2['Production/Area']):
        
        if st == True:
            col = color_selection(cr)
            label = '{}, {}, {}'.format(cr, dist, state)
            label = folium.Popup(label, parse_html=True)
            folium.CircleMarker(location=[lat, lng], 
                                radius=(pr/mean*2), 
                                popup=label, 
                                color=col[0], 
                                fill=True, 
                                fill_color=col[1], 
                                fill_opacity=0.5, 
                                parse_html=False).add_to(map_india)  

    f.add_child(map_india)
    display(f)