# SUMMARY

## SOURCE: GNews API

### Good:
* Only API which returns the full article content
* Good options for filtering and customising queries
* Can search for keywords in titles, summaries and articles, allowing for filtering to human trafficking articles
* Gives link to jpeg image associated with article - would be good for displaying alongside article in dashboard


### Bad:
* Doesn't provide a rank for each news source, and score for how relevant to your query it is.
* "Country" field returned isn't accurate to the location the article is written about (e.g. bbc-news/us-and-canada articles classified as country=GB). It is the country of the publisher.
* Possible multi-day lag behind article publish date?

In [63]:
import json
import urllib.request
from pprint import pprint

In [4]:
API_key = "e3cc2c83c86979be51f685512b43974c"

# SEARCH FIELDS

* <b> q </b>	- This parameter allows you to specify your search keywords to find the news articles you are looking for. The keywords will be used to return the most relevant articles. It is possible to use logical operators with keywords, see the section on query syntax.
* <b> lang </b> - This parameter allows you to specify the language of the news articles returned by the API. You have to set as value the 2 letters code of the language you want to filter.
* <b> country </b>	-	This parameter allows you to specify the country where the news articles returned by the API were published, the contents of the articles are not necessarily related to the specified country. You have to set as value the 2 letters code of the country you want to filter.
* <b> max </b>	- This parameter allows you to specify the number of news articles returned by the API. The minimum value of this parameter is 1 and the maximum value is 100. The value you can set depends on your subscription.
* <b> in </b> - (e.g. title,description) This parameter allows you to choose in which attributes the keywords are searched. The attributes that can be set are title, description and content. It is possible to combine several attributes by separating them with a comma.
* <b> nullable </b> - This parameter allows you to specify the attributes that you allow to return null values. The attributes that can be set are title, description and content. It is possible to combine several attributes by separating them with a comma.
* <b> from </b> - This parameter allows you to filter the articles that have a publication date greater than or equal to the specified value. 
* <b> to </b> - This parameter allows you to filter the articles that have a publication date smaller than or equal to the specified value. The date must respect the following format:
* <b> sortby </b> - This parameter allows you to choose with which type of sorting the articles should be returned. 
* <b> page </b> - This parameter will only work if you have a paid subscription activated on your account.
* <b> expand </b> - This parameter allows you to return in addition to other data, the full content of the articles. To get the full content of the articles, the parameter must be set to content

# RETURNS FIELDS

* <b> title </b> - The main title of the article.
* <b> description </b> - The small paragraph under the title.
* <b> content </b> - All the content of the article. The content is truncated if the expand parameter is not set.
* <b> url </b> - The URL of the article.
* <b> image </b> - The main image of the article.
* <b> publishedAt </b> - The date of publication of the article. The date is always in the UTC time zone.
* <b> source.name </b> - The name of the source.
* <b> source.url </b> - The home page of the source.

# QUERY THE API FOR HUMAN TRAFFICKING RELATED ARTICLES

In [21]:
url

'https://gnews.io/api/v4/search?q="Apple iPhone 13" AND NOT "Apple iPhone 14"&token=e3cc2c83c86979be51f685512b43974c&lang=en&country=us&max=100'

In [53]:
keywords = [
    "human trafficking", "modern slavery", "sexual exploitation", "trafficking", "domestic servitude",
    "forced labour", "debt bondage", "forced begging", "forced marriage", #"organ removal"
]

query = '"' + '" OR "'.join(keywords) + '"'
query = query.replace(" ", "%20")
query

'"human%20trafficking"%20OR%20"modern%20slavery"%20OR%20"sexual%20exploitation"%20OR%20"trafficking"%20OR%20"domestic%20servitude"%20OR%20"forced%20labour"%20OR%20"debt%20bondage"%20OR%20"forced%20begging"%20OR%20"forced%20marriage"'

In [54]:
url = (
    f"https://gnews.io/api/v4/search?"
    f'q={query}&'
    f"token={API_key}&"
    "lang=en&"
    "country=us&"
    "max=100"
)

url

'https://gnews.io/api/v4/search?q="human%20trafficking"%20OR%20"modern%20slavery"%20OR%20"sexual%20exploitation"%20OR%20"trafficking"%20OR%20"domestic%20servitude"%20OR%20"forced%20labour"%20OR%20"debt%20bondage"%20OR%20"forced%20begging"%20OR%20"forced%20marriage"&token=e3cc2c83c86979be51f685512b43974c&lang=en&country=us&max=100'

In [55]:
with urllib.request.urlopen(url) as response:
    data = json.loads(response.read().decode("utf-8"))

In [56]:
data

{'totalArticles': 2982,
 'articles': [{'title': '‘Doc’ Antle wildlife trafficking trial delayed until June 2023',
   'description': 'The trial for the Myrtle Beach man made famous in the Netflix series, “Tiger King” scheduled to begin Monday in Virginia is delayed. Again.',
   'content': 'MYRTLE BEACH, S.C. (WMBF) - The trial for the Myrtle Beach man made famous in the Netflix series, “Tiger King” scheduled to begin Monday in Virginia is delayed. Again.\nThe Frederick County Circuit Court confirmed the trial is rescheduled for June 12 ... [964 chars]',
   'url': 'https://www.wistv.com/2022/10/31/doc-antle-wildlife-trafficking-trial-delayed-until-june-2023/',
   'image': 'https://gray-wistv-prod.cdn.arcpublishing.com/resizer/lV73Ott2ShVZIN94OEjBV1tIRL4=/1200x600/smart/filters:quality(85)/cloudfront-us-east-1.images.arcpublishing.com/gray/DTE6VVGXOJD3NLDME5FAJFYT4A.png',
   'publishedAt': '2022-10-31T04:00:00Z',
   'source': {'name': 'WIS10', 'url': 'https://www.wistv.com'}},
  {'title':

In [64]:
articles = data["articles"]

pprint(articles[0])

{'content': 'MYRTLE BEACH, S.C. (WMBF) - The trial for the Myrtle Beach man '
            'made famous in the Netflix series, “Tiger King” scheduled to '
            'begin Monday in Virginia is delayed. Again.\n'
            'The Frederick County Circuit Court confirmed the trial is '
            'rescheduled for June 12 ... [964 chars]',
 'description': 'The trial for the Myrtle Beach man made famous in the Netflix '
                'series, “Tiger King” scheduled to begin Monday in Virginia is '
                'delayed. Again.',
 'image': 'https://gray-wistv-prod.cdn.arcpublishing.com/resizer/lV73Ott2ShVZIN94OEjBV1tIRL4=/1200x600/smart/filters:quality(85)/cloudfront-us-east-1.images.arcpublishing.com/gray/DTE6VVGXOJD3NLDME5FAJFYT4A.png',
 'publishedAt': '2022-10-31T04:00:00Z',
 'source': {'name': 'WIS10', 'url': 'https://www.wistv.com'},
 'title': '‘Doc’ Antle wildlife trafficking trial delayed until June 2023',
 'url': 'https://www.wistv.com/2022/10/31/doc-antle-wildlife-trafficki

In [61]:
for i in range(len(articles)):
    print("===================================================================================================================")
    print()
    print(f"Title: {articles[i]['title']}")
    print()
    print(f"Description: {articles[i]['description']}")
    print()
    print(f"Content: {articles[i]['content']}")
    print()


Title: ‘Doc’ Antle wildlife trafficking trial delayed until June 2023

Description: The trial for the Myrtle Beach man made famous in the Netflix series, “Tiger King” scheduled to begin Monday in Virginia is delayed. Again.

Content: MYRTLE BEACH, S.C. (WMBF) - The trial for the Myrtle Beach man made famous in the Netflix series, “Tiger King” scheduled to begin Monday in Virginia is delayed. Again.
The Frederick County Circuit Court confirmed the trial is rescheduled for June 12 ... [964 chars]


Title: Ghislaine Maxwell seen laughing, jogging on Florida prison track

Description: Notorious sex-trafficking madam Ghislaine Maxwell was filmed enjoying a morning jog in her low-security prison -- laughing hysterically with a pal in the Florida sunshine.

Content: Ghislaine Maxwell is back on the run — but this time safely behind bars.
The notorious sex-trafficking madam for late pedophile Jeffrey Epstein was filmed enjoying a morning jog in her low-security prison — laughing hysterically 

# GEOGRAPHICAL COVERAGE OF ARTICLES

### 773 from UK vs. 2982 from US

In [76]:
url = (
    f"https://gnews.io/api/v4/search?"
    f'q={query}&'
    f"token={API_key}&"
    "lang=en&"
    "country=gb&"
    "max=100&"
)

url

'https://gnews.io/api/v4/search?q="human%20trafficking"%20OR%20"modern%20slavery"%20OR%20"sexual%20exploitation"%20OR%20"trafficking"%20OR%20"domestic%20servitude"%20OR%20"forced%20labour"%20OR%20"debt%20bondage"%20OR%20"forced%20begging"%20OR%20"forced%20marriage"&token=e3cc2c83c86979be51f685512b43974c&lang=en&country=gb&max=100&'

In [77]:
with urllib.request.urlopen(url) as response:
    data = json.loads(response.read().decode("utf-8"))

In [78]:
data

{'totalArticles': 773,
 'articles': [{'title': 'Nova Scotia man charged in London sex trafficking investigation',
   'description': 'Your Local News Network serving London, Windsor, Chatham, Sarnia and Midwestern Ontario',
   'content': 'Cortez Downey, 26, of Nova Scotia. Photo courtesy of London police.\nShare via: Facebook\nTwitter\nLinkedIn\nMore\nLondon police believe there could be more victims after a Nova Scotia man was arrested for sex trafficking women in the city this summer.\nPo... [961 chars]',
   'url': 'https://blackburnnews.com/london/london-news/2022/10/28/nova-scotia-man-charged-london-sex-trafficking-investigation/',
   'image': 'https://blackburnnews.com/wp-content/uploads/2022/10/downey-400x250.png',
   'publishedAt': '2022-10-28T11:35:44Z',
   'source': {'name': 'BlackburnNews.com',
    'url': 'https://blackburnnews.com'}},
  {'title': 'Police seek more victims as man, 26, faces human-trafficking charges',
   'description': 'A Nova Scotia man is facing human traffi

# DATES OF ARTICLES

In [83]:
from datetime import datetime as dt

articles = data["articles"]
dates = []

for article in articles:
    dates.append(dt.strptime(article["publishedAt"], "%Y-%m-%dT%H:%M:%SZ").strftime("%Y-%m-%d"))
    
dates

['2022-10-28',
 '2022-10-27',
 '2022-10-27',
 '2022-10-26',
 '2022-10-25',
 '2022-10-25',
 '2022-10-25',
 '2022-10-25',
 '2022-10-25',
 '2022-10-24']