# Natural Language Processing using OpenAI API

This notebook shows how to extract information from news articles using OpenAI API. The notebook is based on the excellent course [ChatGPT Prompt Engineering for Developers](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/) by Andrew Ng.

### Setup

In [None]:
%%capture
if 'google.colab' in str(get_ipython()):
  !apt install libspatialindex-dev
  !pip install fiona shapely pyproj rtree mapclassify
  !pip install geopandas
  !pip install openai
  !pip install geopy

In [None]:
import openai
import os
import json
import pandas as pd
from geopy.geocoders import GoogleV3
from geopy.extra.rate_limiter import RateLimiter
import folium
from folium import Figure

import geopandas as gpd



Add your OpenAI API Key below. You need to [sign-up](https://platform.openai.com/signup) and obtain a key. This requires setting up a billing account. If you want to experiement, you can use the free environment provided by the [ChatGPT Prompt Engineering for Developers](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/) course.

Add your Google Maps API Key below. This requires [signing-up](https://console.cloud.google.com/) using Google Cloud Console and setting up a billing account. Once done, make sure to enable Geocoding API and get a key.

In [None]:
openai.api_key  = ''
google_maps_api_key = ''

Initialize the model.

In [None]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

## Load Data

In [None]:
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

In [None]:
def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

data_url = 'https://github.com/spatialthoughts/python-tutorials/raw/main/data/'

articles = ['article1.txt', 'article2.txt', 'article3.txt']

for article in articles:
  download(data_url + article)


Downloaded data/article1.txt
Downloaded data/article2.txt
Downloaded data/article3.txt


## Get AI Predictions

Read the data.

In [None]:
articles_texts = []

for article in articles:
  path = os.path.join(data_folder, article)
  f = open(path, 'r')
  articles_texts.append(f.read())

Display the excerpt of the first article

In [None]:
print(articles_texts[0][:250])

Title:
2 Persons Trampled To Death By Elephants In 2 Days In Odisha’s Dhenkanal

Description:
Dhenkanal: Human casualty due to elephant attack continued in Odisha’s Dhenkanal district as a man was trampled to death by a herd on Saturday.
According to



We design a prompt to extract specific information from the news article in JSON format.

In [None]:
results = []

for article_text in articles_texts:
  prompt = f"""
    Identify the following items from the news article
    - Location of the incident
    - Number of people injured
    - Number of people killed
    - Short summary

    The news article is delimited with triple backticks.
    Format your response as a JSON object with 'location', 'num_injured', 'num_killed' and 'summary' as the keys. \
    If the information isn't present, use "unknown" as the value.
    Make your response as short as possible.

    News article: '''{article_text}'''
  """
  response = get_completion(prompt)
  results.append(json.loads(response))

In [None]:
df = pd.DataFrame.from_dict(results)

## Geocode Locations

In [None]:
locator = GoogleV3(api_key=google_maps_api_key)
geocode_fn = RateLimiter(locator.geocode, min_delay_seconds=2)

df['geocoded'] = df['location'].apply(geocode_fn)


In [None]:
df

Unnamed: 0,location,num_injured,num_killed,summary,geocoded
0,"Dhenkanal, Odisha",unknown,2,Two people were trampled to death by elephants...,"(Dhenkanal, Odisha, India, (20.6504753, 85.598..."
1,Jharkhand's Latehar district,unknown,3,"Three members of a family, including a three-y...","(Latehar, Jharkhand, India, (23.7555791, 84.35..."
2,"Perumugai, T.N. Palayam block, Tamil Nadu, India",unknown,1,Wild elephant Karuppan trampled a daily wage w...,"(Perumugai, Tamil Nadu, India, (11.5187553, 77..."


In [None]:
df['point'] = df['geocoded'].apply(lambda loc: tuple(loc.point) if loc else None)
df[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index)
df = df[['location', 'num_injured', 'num_killed', 'summary', 'latitude', 'longitude']]

In [None]:
geometry = gpd.points_from_xy(df.longitude, df.latitude)
gdf = gpd.GeoDataFrame(df, crs='EPSG:4326', geometry=geometry)
gdf

Unnamed: 0,location,num_injured,num_killed,summary,latitude,longitude,geometry
0,"Dhenkanal, Odisha",unknown,2,Two people were trampled to death by elephants...,20.650475,85.598122,POINT (85.59812 20.65048)
1,Jharkhand's Latehar district,unknown,3,"Three members of a family, including a three-y...",23.755579,84.354205,POINT (84.35420 23.75558)
2,"Perumugai, T.N. Palayam block, Tamil Nadu, India",unknown,1,Wild elephant Karuppan trampled a daily wage w...,11.518755,77.464461,POINT (77.46446 11.51876)


In [None]:
bounds = gdf.total_bounds

fig = Figure(width=800, height=400)

m = folium.Map()
m.fit_bounds([[bounds[1],bounds[0]], [bounds[3],bounds[2]]])

gdf.explore(
    m=m,
    tooltip=['location', 'num_killed'],
    popup=['location', 'num_killed'],
    marker_kwds=dict(radius=5))

fig.add_child(m)