# FOOD FOR ALL - Analytics Application

**STEP 1:- Scan news collection from Watson Discovery service using Discovery API**. 

Watson discovery is an enterprise search engine that can comb through an unstructured "document collection" and retieve the relevant information based on a search query criteria. The sample query string for this notebook used is "migrant labourers india". The results of the query are in the JSON format with relevant snapshot passages and the link to the news portal along with date. The following piece of code calls the API on a discovery service that we have instantiated hosted on IBM Cloud The code  parses the JSON data to list out the data in a tabular form.

In [1]:
import json
import subprocess
import sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "ibm_watson"])
from ibm_watson import DiscoveryV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('ABCDEFGHIJKLMNOP')
discovery = DiscoveryV1(
    version='2020-05-31',
    authenticator=authenticator
)
#Set the URL to our discovery instance
discovery.set_service_url('https://api.eu-gb.discovery.watson.cloud.ibm.com/instances/22aecf78-2273-4e59-acec-38635eee713e')

#Query the Discovery service to fetch the data related to migrants
out_str = discovery.query('system', 'news-en', query='enriched_text.concepts.text:migrant india').get_result()


#print(out_str['matching_results'])

for i in out_str['results']:
    #print(json.dumps(i, indent=2) )
    print("Date of incident: " + i['crawl_date'])
    print("Source of news: " + i['url'])
    print("Actual content: " + i['text'])
    print("=============================================")

Waiting for a Spark session to start...
Spark Initialization Done! ApplicationId = app-20200606144413-0001
KERNEL_ID = d3322787-02af-4942-abe8-28e69eb80955
Date of incident: 2020-05-05T06:57:26Z
Source of news: https://savedelete.com/photos/bihar-migrant-workers-return-to-patna-from-ernakulam-via-a-special-shramik-train-during-the-extended-nationwide-lockdown/311003/
Actual content: Kolkata: Bihar migrant workers return to Patna from Ernakulam via a Special Shramik train during the extended nationwide lockdown, on May 4, 2020. (Photo: IANS) A 1000-bed facility being contructed at the exhibition ground in Mumbai’s Bandra Kurla Complex to provide quarantine and isolation facilities for non-critical COVID-19 patients during
Date of incident: 2020-05-28T13:09:43Z
Source of news: http://www.indiaenvironmentportal.org.in/content/467701/workers-in-the-time-of-covid-19-evidence-from-a-rapid-assessment-in-bihar/
Actual content: India.The study and this report seeks to rapidly discover and brief

**STEP2:- Get latitude and longitude coordinates of location that needs help.**  

For purpose of demo, we have manually extracted the named entities of interest to the use case. In the long term, we plan to use NLP and Text Analytics to do extraction automatically. So for the location "Hsr Layout Karnataka" from Step1 above, we use GeoPositionalAPI to get the exact coordinates. The API takes a text string location and returns the latitude and longitude.

In [2]:
import pycurl
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://api.positionstack.com/v1/forward?access_key=ABCDEFGHIJKLMNOP&query=Hsr+layout+Karnataka')
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
body = buffer.getvalue()
# Body is a byte string.
# We have to know the encoding in order to print it to a text file
# such as standard output.
print(body.decode('iso-8859-1'))

{"data":[{"latitude":12.922005,"longitude":77.648683,"type":"venue","name":"Hsr layout","number":"216","postal_code":"560102","street":"24th Main Road","confidence":1,"region":"Karnataka","region_code":"KA","county":"Bangalore","locality":"Bangalore","administrative_area":null,"neighbourhood":null,"country":"India","country_code":"IND","continent":"Asia","label":"Hsr layout, Bangalore, India"},{"latitude":12.904165,"longitude":77.650013,"type":"neighbourhood","name":"HSR Layout","number":null,"postal_code":null,"street":null,"confidence":1,"region":"Karnataka","region_code":"KA","county":"Bangalore","locality":"Bangalore","administrative_area":null,"neighbourhood":"HSR Layout","country":"India","country_code":"IND","continent":"Asia","label":"HSR Layout, Bangalore, India"},{"latitude":12.913739,"longitude":77.637465,"type":"venue","name":"HSR Layout BDA Complex","number":null,"postal_code":"560034","street":"14th Main Road","confidence":1,"region":"Karnataka","region_code":"KA","county

**STEP3:- Read the NGO/Foundation data from Cloud Object Storage into a DataFrame.**

We have used COS as a data storage for the purpose of demo. A CSV file containing the names; lat/long info of NGOs is read into DataFrame (ngo_df)- so that it can used further for geo spatial analytics.

In [3]:
from pyst import STContext
# Register STContext, which is the main entry point
stc = STContext(spark.sparkContext._gateway)

In [4]:
# Read ngo csv file
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_f27f0e76464c443c9c6ab99d6ad324d8 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='ABCDEFHIJKLMNOP',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_f27f0e76464c443c9c6ab99d6ad324d8.get_object(Bucket='foodforall-donotdelete-pr-rnf4ycyspogcjz',Key='NGO.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

ngo_df = pd.read_csv(body)
ngo_df

Unnamed: 0,name,city,latitude,longitude
0,Akshaya Patra Foundation,Bengaluru,13.017672,77.54982
1,the Attria Foundation,Bengaluru,13.032518,77.592124
2,Humanity First,Hyderabad,17.385044,78.486671
3,Habitat for Humanity India,Bengaluru,13.026731,77.631395
4,Oxfam India,Bengaluru,13.016299,77.636412
5,SEEDS,New Delhi,28.5888,77.084866
6,Humane Universal Good Deeds Network,Bengaluru,12.912118,77.644555
7,Rise Against Hunger (RAHI),Bengaluru,13.018392,77.653632


**STEP4:- Read the location data from Cloud Object Storage into a DataFrame.**

We have used COS as a data storage for the purpose of demo. A CSV file containing the names; lat/long info of locations (that are seeking aid) is read into DataFrame (migrant_df) - so that it can used further for geo spatial analytics.

In [5]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_3d68c5c911bc498ba98b556479cc036a = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='ABCDEFGHIJKLMNOP',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_3d68c5c911bc498ba98b556479cc036a.get_object(Bucket='projectsurbhi-donotdelete-pr-iywkvi1mn5xqjc',Key='migrant.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

migrant_df = pd.read_csv(body)
migrant_df.head()

Unnamed: 0,city,Location,latitude,longitude,supplies
0,Bengaluru,HSR Layout Sector 7,12.91634,77.649467,105
1,Bengaluru,HSR Layout Sector 1,12.91614,77.623459,119
2,Bengaluru,Shanti Nagar,12.989158,77.606089,50
3,Bengaluru,Yeshwantpur,13.02383,77.552921,550
4,Bengaluru,Mahalaxmi Layout,13.012419,77.543993,490


In [6]:
# Convert pandas dataframe to geo dataframe for NGO
import csv, json
from geojson import Feature, FeatureCollection, Point
features = []
for index, row in ngo_df.iterrows():
    latitude, longitude = map(float, (row['latitude'], row['longitude']))
    features.append(
        Feature(
            geometry = Point((longitude, latitude)),
            properties = {
                'name': row['name'],
                 'city': row['city'],
                'latitude': row['latitude'],
                 'longitude': row['longitude']
            }
        )
    )
ngo_json = FeatureCollection(features)
ngo_df = stc.geojson_reader().read(ngo_json)
ngo_df

  df = json_normalize(properties)


Unnamed: 0,name,longitude,city,latitude,geometry
0,Akshaya Patra Foundation,77.54982,Bengaluru,13.017672,"Point(13.017672, 77.54982)"
1,the Attria Foundation,77.592124,Bengaluru,13.032518,"Point(13.032518, 77.592124)"
2,Humanity First,78.486671,Hyderabad,17.385044,"Point(17.385044, 78.486671)"
3,Habitat for Humanity India,77.631395,Bengaluru,13.026731,"Point(13.026731, 77.631395)"
4,Oxfam India,77.636412,Bengaluru,13.016299,"Point(13.016299, 77.63641199999999)"
5,SEEDS,77.084866,New Delhi,28.5888,"Point(28.5888, 77.084866)"
6,Humane Universal Good Deeds Network,77.644555,Bengaluru,12.912118,"Point(12.912118, 77.64455500000001)"
7,Rise Against Hunger (RAHI),77.653632,Bengaluru,13.018392,"Point(13.018392, 77.653632)"


In [7]:
# Convert pandas dataframe to geo dataframe for NGO
features = []
for index, row in migrant_df.iterrows():
    latitude, longitude = map(float, (row['latitude'], row['longitude']))
    features.append(
        Feature(
            geometry = Point((longitude, latitude)),
            properties = {
                'Location': row['Location'],
                 'city': row['city'],
                'latitude': row['latitude'],
                 'longitude': row['longitude']
            }
        )
    )
migrant_json = FeatureCollection(features)
migrant_df = stc.geojson_reader().read(migrant_json)
migrant_df

  df = json_normalize(properties)


Unnamed: 0,Location,longitude,city,latitude,geometry
0,HSR Layout Sector 7,77.649467,Bengaluru,12.91634,"Point(12.91634, 77.649467)"
1,HSR Layout Sector 1,77.623459,Bengaluru,12.91614,"Point(12.916139999999999, 77.623459)"
2,Shanti Nagar,77.606089,Bengaluru,12.989158,"Point(12.98915835, 77.60608918)"
3,Yeshwantpur,77.552921,Bengaluru,13.02383,"Point(13.02383, 77.5529215)"
4,Mahalaxmi Layout,77.543993,Bengaluru,13.012419,"Point(13.01241878, 77.543993)"
5,Yelahanka,77.606894,Bengaluru,13.078474,"Point(13.0784743, 77.6068938)"
6,Mubarak Nagar,77.2274,Delhi,28.5755,"Point(28.5755, 77.2274)"
7,Vasant Vihar,77.161174,Delhi,28.560802,"Point(28.5608021, 77.1611738)"
8,Chandani Cahwk,77.231,Delhi,28.656,"Point(28.656, 77.23100000000001)"


**STEP5:- Find nearest NGOs for the location that needs help.**

This is the most crucial step of the analytics piece. We want food to be delivered fresh and as quickly as possible to the location. For this to work, we need to notify to NGOs that is geographically near to the location. We use spatio temporal functions to find the nearest neighbors. For purpose of demo we find the NGOs within 7KMs radius of the location. And the result shows there is 1 NGO nearest to "Hsr Layout" location in Bangalore. 

In [8]:
tile_size = 100000 #in meters
si = stc.tessellation_index(tile_size=tile_size) # we leave bbox as None to use full earth as boundingbox
si.from_df(ngo_df, 'name', 'geometry', verbosity='error') #Populate the spatial index

nearest_ngo_name=[]
for row in migrant_df.itertuples():
    loc=row.geometry

    #Print NGO's with 5km radius of given point
    nearest_ngo = si.within_distance_with_info(loc, 7000)
    nearest_points=[]
    for i in nearest_ngo: 
        nearest_points.append(i[0])
    nearest_ngo_name.append(nearest_points)
# Add new column with names of NGO within the range
migrant_df['NGO within range'] = nearest_ngo_name
migrant_df

8 entries processed, 8 entries successfully added


Unnamed: 0,Location,longitude,city,latitude,geometry,NGO within range
0,HSR Layout Sector 7,77.649467,Bengaluru,12.91634,"Point(12.91634, 77.649467)",[Humane Universal Good Deeds Network]
1,HSR Layout Sector 1,77.623459,Bengaluru,12.91614,"Point(12.916139999999999, 77.623459)",[Humane Universal Good Deeds Network]
2,Shanti Nagar,77.606089,Bengaluru,12.989158,"Point(12.98915835, 77.60608918)","[Rise Against Hunger (RAHI), Habitat for Human..."
3,Yeshwantpur,77.552921,Bengaluru,13.02383,"Point(13.02383, 77.5529215)","[the Attria Foundation, Akshaya Patra Foundation]"
4,Mahalaxmi Layout,77.543993,Bengaluru,13.012419,"Point(13.01241878, 77.543993)","[the Attria Foundation, Akshaya Patra Foundation]"
5,Yelahanka,77.606894,Bengaluru,13.078474,"Point(13.0784743, 77.6068938)","[Habitat for Humanity India, the Attria Founda..."
6,Mubarak Nagar,77.2274,Delhi,28.5755,"Point(28.5755, 77.2274)",[]
7,Vasant Vihar,77.161174,Delhi,28.560802,"Point(28.5608021, 77.1611738)",[]
8,Chandani Cahwk,77.231,Delhi,28.656,"Point(28.656, 77.23100000000001)",[]


**STEP6:- Visualization - Part1.**

A picture can speak a thousand words. A visualization chart can show what the code wants to say in a snap. The following lines of code plots the locations on the map to show the total number of locations needing help and NGOs that offer help. This helps to make out clusters that are sparsely spread and ones that need more attention. The NGO locations are marked in BLUE and migrant locations are marked in RED.

In [9]:
import folium
m1 = folium.Map([13.01767,77.54990])

locs_ngo = zip(ngo_df.latitude, ngo_df.longitude)
locs_migrants = zip(migrant_df.latitude, migrant_df.longitude)
for location in locs_ngo:
    folium.CircleMarker(location=location, color="blue",   radius=4).add_to(m1)
for location in locs_migrants:
    folium.CircleMarker(location=location, color="red",   radius=4).add_to(m1)

m1

**STEP7:- Visualization-Part2** 

The following lines of code plots the connections between NGOs and locations. This way we know on the India map, the nearest NGOs for a location again visually representing the crux of analytics piece. This data is then fed to a database. The FoodForAll system can then pick the data from the database to display on the web portal.

In [10]:
from geojson import MultiLineString,LineString

nearest_points=[]
for row in migrant_df.itertuples():
    loc=row.geometry
    nearest_ngo = si.within_distance_with_info(loc, 7000)
    for i in nearest_ngo: 
        point1=[]
        temp=[]
        point2=[]
        point1.append(loc.get_longitude())
        point1.append(loc.get_latitude())
        point2.append(i[1].get_longitude())
        point2.append(i[1].get_latitude())
        temp.append(point1)
        temp.append(point2)
        nearest_points.append(temp)
migrant_df['ngo_geo'] = migrant_df.apply(lambda row: MultiLineString(nearest_points), axis=1)
migrant_df



Unnamed: 0,Location,longitude,city,latitude,geometry,NGO within range,ngo_geo
0,HSR Layout Sector 7,77.649467,Bengaluru,12.91634,"Point(12.91634, 77.649467)",[Humane Universal Good Deeds Network],"{'type': 'MultiLineString', 'coordinates': [[[..."
1,HSR Layout Sector 1,77.623459,Bengaluru,12.91614,"Point(12.916139999999999, 77.623459)",[Humane Universal Good Deeds Network],"{'type': 'MultiLineString', 'coordinates': [[[..."
2,Shanti Nagar,77.606089,Bengaluru,12.989158,"Point(12.98915835, 77.60608918)","[Rise Against Hunger (RAHI), Habitat for Human...","{'type': 'MultiLineString', 'coordinates': [[[..."
3,Yeshwantpur,77.552921,Bengaluru,13.02383,"Point(13.02383, 77.5529215)","[the Attria Foundation, Akshaya Patra Foundation]","{'type': 'MultiLineString', 'coordinates': [[[..."
4,Mahalaxmi Layout,77.543993,Bengaluru,13.012419,"Point(13.01241878, 77.543993)","[the Attria Foundation, Akshaya Patra Foundation]","{'type': 'MultiLineString', 'coordinates': [[[..."
5,Yelahanka,77.606894,Bengaluru,13.078474,"Point(13.0784743, 77.6068938)","[Habitat for Humanity India, the Attria Founda...","{'type': 'MultiLineString', 'coordinates': [[[..."
6,Mubarak Nagar,77.2274,Delhi,28.5755,"Point(28.5755, 77.2274)",[],"{'type': 'MultiLineString', 'coordinates': [[[..."
7,Vasant Vihar,77.161174,Delhi,28.560802,"Point(28.5608021, 77.1611738)",[],"{'type': 'MultiLineString', 'coordinates': [[[..."
8,Chandani Cahwk,77.231,Delhi,28.656,"Point(28.656, 77.23100000000001)",[],"{'type': 'MultiLineString', 'coordinates': [[[..."


In [12]:
!pip install geopandas
import geopandas as gpd
line_gdf=gpd.GeoDataFrame(migrant_df, geometry=migrant_df.ngo_geo, crs={"init":"EPSG:4326"})
line_gdf
folium.GeoJson(line_gdf).add_to(m1)

Collecting geopandas
  Using cached https://files.pythonhosted.org/packages/83/c5/3cf9cdc39a6f2552922f79915f36b45a95b71fd343cfc51170a5b6ddb6e8/geopandas-0.7.0-py2.py3-none-any.whl
Collecting fiona (from geopandas)
  Using cached https://files.pythonhosted.org/packages/ec/20/4e63bc5c6e62df889297b382c3ccd4a7a488b00946aaaf81a118158c6f09/Fiona-1.8.13.post1-cp36-cp36m-manylinux1_x86_64.whl
Collecting pandas>=0.23.0 (from geopandas)
  Using cached https://files.pythonhosted.org/packages/8e/86/c14387d6813ebadb7bf61b9ad270ffff111c8b587e4d266e07de774e385e/pandas-1.0.4-cp36-cp36m-manylinux1_x86_64.whl
Collecting pyproj>=2.2.0 (from geopandas)
  Using cached https://files.pythonhosted.org/packages/e5/c3/071e080230ac4b6c64f1a2e2f9161c9737a2bc7b683d2c90b024825000c0/pyproj-2.6.1.post1-cp36-cp36m-manylinux2010_x86_64.whl
Collecting shapely (from geopandas)
  Using cached https://files.pythonhosted.org/packages/20/fa/c96d3461fda99ed8e82ff0b219ac2c8384694b4e640a611a1a8390ecd415/Shapely-1.7.0-cp36-cp36m

  return _prepare_from_string(" ".join(pjargs))


<folium.features.GeoJson at 0x7f4ec1991d30>

In [13]:
m1