# Web APIs for text processing

You will need to [generate a private key at Rapdiapi](https://rapidapi.com/), then you can use it here.

Note that you also need to **Subscribe** to a given API (click on _Pricing_) before you can use it.

In [1]:
# I keep mine in another file, which is not sent to GitHub
# Or put yours in the string 'key' below
try:
    from secrets import mashape as key2
    from secrets import mashape_old as key
except ImportError as e:
    key = " KEY GOES HERE "

# Start with some sample text
Let's start with some text from [a geological abstract from GAC 2015](http://www.gac.ca/wp/wp-content/uploads/2011/09/2015_Joint_Assembly_Abstract_Proceedings.pdf) (page 175)...

- Abstract ID: 34562
- Final Number: GP22A-03
- Title: Ediacaran Paleomagnetism of Well-dated Units in Laurentia and West Avalonia: Implications for Models of Oscillatory True Polar Wander, Equatorial Dipoles and Rapid Continental Drift
- Presenter/First Author: Kenneth L Buchan, Geological Survey of Canada, kbuchan@nrcan.gc.ca
- Co-authors: Michael A Hamilton, University of Toronto, Toronto, ON, Canada; Joseph P Hodych, Memorial University, St. John'S, NL, Canada

In [2]:
t = """Ediacaran paleomagnetic data from Laurentia are complex,
with inclinations of presumed primary remanences that differ by up to 90°
or more, often within single geological units. These unusual data have
been variously interpreted as due to magnetic overprinting, very rapid
continental drift, one or more episodes of oscillatory ~90° true polar
wander (TPW), or unusual behaviour of the geomagnetic field such as an
equatorial dipole. Here we review the Laurentia data in the 615-565 Ma
period. The ages assigned to steep and normal components, if they are
primary, appear to require at least two full oscillations during the period in
question. There is growing evidence (especially from the Grenville and
Rideau dyke swarms for which 9 precise U-Pb baddeleyite ages are now
available) indicating that the magnetic directional changes are much too
rapid to accommodate either rapid drift or TPW (using current theoretical
models). In addition, the paleomagnetic data do not always conform to an
equatorial dipole model in which paleopoles should differ by 90°. We also
review the paleomagnetic data from well-dated (606-570 Ma) Ediacaran
units of the West Avalonia microcontinent, which appear to be simpler
than those from Laurentia. Unlike Laurentia units, individual units of West
Avalonia usually carry a single presumed primary remanence direction (of
dual polarity), rather than two discrete remanence directions or directions
that are streaked along a great circle that might record rapid TPW. Large
directional (mainly declination) changes between units are usually
interpreted as due to block rotations, but alternatively could reflect TPW
or unusual behaviour of the magnetic field. However, the corresponding
paleopole changes are significantly less than the 90⁰ expected for an
equatorial dipole model. Taken together, the Ediacaran Laurentia and
Avalonia data do not appear consistent with current models of oscillatory
TPW, an equatorial dipole or unusually fast drift."""

We'll do a bit of preconditioning.

In [3]:
import urllib
import re

t = re.sub(r'\n', '', t)
t = re.sub(r'°', ' degrees ', t)

#text = urllib.quote_plus(t)
text = t

## Topic tagging

In [4]:
import requests

response = requests.post("https://twinword-topic-tagging.p.rapidapi.com/generate/",
  headers={
    "X-RapidAPI-Key": key,
    "Content-Type": "application/x-www-form-urlencoded",
    "Accept": "application/json"
  },
  data={
    "text": text,
  }
)

In [5]:
response

<Response [200]>

In [6]:
response.json()

{'keyword': {'data': 6,
  'unit': 4,
  'change': 3,
  'direction': 3,
  'appear': 3,
  'degree': 3,
  'unusual': 3,
  'model': 3,
  'magnetic': 3,
  'drift': 3},
 'topic': {'move': 0.11494061954987,
  'book': 0.11494061954987,
  'special': 0.11494061954987,
  'difference': 0.10057304210614,
  'time': 0.10057304210614,
  'change': 0.10057304210614,
  'country': 0.10057304210614,
  'mode': 0.10057304210614,
  'separate': 0.10057304210614,
  'character': 0.10057304210614},
 'version': '5.0.0',
 'author': 'twinword inc.',
 'email': 'help@twinword.com',
 'result_code': '200',
 'result_msg': 'Success'}

## Summarization

**Aylien** has a lot of endpoints: https://rapidapi.com/aylien/api/text-analysis

In [7]:
url = "https://aylien-text.p.rapidapi.com/summarize"
params = {'text': text,
          'title': 'The Ediacaran of Laurentia'}

In [8]:
import requests

response = requests.get(url,
  headers={
    "X-RapidAPI-Key": key,
    "Accept": "application/json"
  },
  params=params
)

In [9]:
response.status_code

200

In [10]:
response.json()['sentences']

['Ediacaran paleomagnetic data from Laurentia are complex,with inclinations of presumed primary remanences that differ by up to 90 degrees or more, often within single geological units.',
 'These unusual data havebeen variously interpreted as due to magnetic overprinting, very rapidcontinental drift, one or more episodes of oscillatory ~90 degrees  true polarwander (TPW), or unusual behaviour of the geomagnetic field such as anequatorial dipole.',
 'In addition, the paleomagnetic data do not always conform to anequatorial dipole model in which paleopoles should differ by 90 degrees .',
 'Unlike Laurentia units, individual units of WestAvalonia usually carry a single presumed primary remanence direction (ofdual polarity), rather than two discrete remanence directions or directionsthat are streaked along a great circle that might record rapid TPW.',
 'Taken together, the Ediacaran Laurentia andAvalonia data do not appear consistent with current models of oscillatoryTPW, an equatorial dip

## Readability metrics

In [11]:
url = "https://ipeirotis-readability-metrics.p.rapidapi.com/getReadabilityMetrics"
params = {'text': text}
headers = {"X-RapidAPI-Key": key,  # Your private Mashape key
           "Content-Type": "application/x-www-form-urlencoded",
           "Accept": "application/json"
          }

In [12]:
response = requests.post(url, headers=headers, data=params)

In [13]:
print(response.text)

{
 "COLEMAN_LIAU": 18.670,
 "GUNNING_FOG": 21.600,
 "ARI": 19.298,
 "SENTENCES": 11.000,
 "SYLLABLES": 596.000,
 "SMOG_INDEX": 18.045,
 "COMPLEXWORDS": 83.000,
 "FLESCH_KINCAID": 19.910,
 "WORDS": 272.000,
 "CHARACTERS": 1638.000,
 "FLESCH_READING": -3.637,
 "SMOG": 18.821
}


## Sentence case

**THIS API SEEMS TO BE BROKEN**

**Sprawk** is handy. `'mode'` can be `default, title, sentence, lower, upper, nospace`.

In [21]:
title = "Ediacaran Paleomagnetism of Well-dated Units in Laurentia and West Avalonia: Implications for Models of Oscillatory True Polar Wander, Equatorial Dipoles and Rapid Continental Drift"

url = "https://sprawkcapitalizer.p.rapidapi.com/api/applyCaps"
params = {'text': title,
          'lang': 'autoDetect',
          'mode': 'sentence'}
headers = {"X-RapidAPI-Key": key,
           "X-RapidAPI-Host": "sprawkcapitalizer.p.rapidapi.com"}

response = requests.get(url, headers=headers, params=params)

In [22]:
response

<Response [504]>

Almost but not quite.

In [23]:
title = "Giant Dinosaurs Found in Utah at Smith's Canyon"
params = {'text': title,
          'lang': 'autoDetect',
          'mode': 'sentence'}
response = requests.get(url, headers=headers, params=params)

KeyboardInterrupt: 

In [None]:
response

That is actually pretty awesome.

## Sentiment analysis

There are a lot of sentiment analysis services!

In [37]:
text = "Not sure about this. Might be dodgy. Please check!"

### Japerk

In [38]:
url = "https://japerk-text-processing.p.rapidapi.com/sentiment/"
params = {'text': text,
          'language': 'english'}
headers = {"X-RapidAPI-Key": key,
           "X-RapidAPI-Host": "japerk-text-processing.p.rapidapi.com",
           "Content-Type": "application/x-www-form-urlencoded",
           "Accept": "application/json"
          }

In [39]:
response = requests.post(url, headers=headers, data=params)

In [40]:
print(response.text)

{"probability": {"neg": 0.58060346952523922, "neutral": 0.052386105650424196, "pos": 0.41939653047476078}, "label": "neg"}


### TwinWord

[docs](https://rapidapi.com/twinword/api/sentiment-analysis)

In [41]:
url = "https://twinword-sentiment-analysis.p.rapidapi.com/analyze/"
params = {'text': text,
          'language': 'english'}
headers = {"X-RapidAPI-Key": key,
           "X-RapidAPI-Host": "twinword-sentiment-analysis.p.rapidapi.com",
           "Content-Type": "application/x-www-form-urlencoded",
           "Accept": "application/json"
          }

In [42]:
response = requests.post(url, headers=headers, data=params)

In [43]:
response.json()

{'type': 'negative',
 'score': -0.13164588475,
 'ratio': -0.57505746580046,
 'keywords': [{'word': 'please', 'score': 0.137689844},
  {'word': 'sure', 'score': 0.056871358},
  {'word': 'not', 'score': -0.625},
  {'word': 'check', 'score': -0.096144741}],
 'version': '4.0.0',
 'author': 'twinword inc.',
 'email': 'help@twinword.com',
 'result_code': '200',
 'result_msg': 'Success'}

### Microsoft Text Analytics

> The API returns a numeric score between 0 and 1. Scores close to 1 indicate positive sentiment, while scores close to 0 indicate negative sentiment. Sentiment score is generated using classification techniques. The input features to the classifier include n-grams, features generated from part-of-speech tags, and word embeddings.

**Send multiple docs at once!**

In [57]:
import json

url = "https://microsoft-azure-text-analytics-v1.p.rapidapi.com/sentiment"
params = json.dumps({'documents': [{"language":"en","id":"string","text":text}]})
headers = {"X-RapidAPI-Key": key,
           "X-RapidAPI-Host": "microsoft-azure-text-analytics-v1.p.rapidapi.com",
           "Content-Type": "application/json",
           "Accept": "application/json"
          }

In [58]:
response = requests.post(url, headers=headers, data=params)

In [59]:
response.json()

{'documents': [{'id': 'string', 'score': 0.17616775631904602}], 'errors': []}

### TextAnalysis

#### PATTERN

In [72]:
import json

url = "https://textanalysis.p.rapidapi.com/pattern-sentiment-analysis"
params = {'text': text}
headers = {"X-RapidAPI-Key": key,
           "X-RapidAPI-Host": "textanalysis.p.rapidapi.com",
           "Content-Type": "application/x-www-form-urlencoded",
          }

In [73]:
response = requests.post(url, headers=headers, data=params)

In [74]:
response.json()

{'Polarity': -0.3125, 'Subjectivity': 0.8888888888888888}

#### BLOB

In [78]:
import json

url = "https://textanalysis.p.rapidapi.com/textblob-sentiment-analysis"
params = {'text': text}
headers = {"X-RapidAPI-Key": key,
           "X-RapidAPI-Host": "textanalysis.p.rapidapi.com",
           "Content-Type": "application/x-www-form-urlencoded",
          }

In [79]:
response = requests.post(url, headers=headers, data=params)

In [80]:
response.json()

{'Polarity': -0.3125, 'Subjectivity': 0.8888888888888888}

Seems exactly the same??

## Language detection

In [87]:
text_n = "Dette her er dårlig, jeg like det ikke."

### TextAnalysis

In [88]:
import json

url = "https://textanalysis.p.rapidapi.com/langid-language-detection"
params = {'text': text_n}
headers = {"X-RapidAPI-Key": key,
           "X-RapidAPI-Host": "textanalysis.p.rapidapi.com",
           "Content-Type": "application/x-www-form-urlencoded",
          }

In [89]:
response = requests.post(url, headers=headers, data=params)

In [90]:
response.json()

{'confidence': 0.493244138053385, 'language': 'nb'}

`'nb'` is Norwegian Bokmål.

### CloudNLP

In [69]:
import json

url = "https://mtnfog-cloud-nlp-v1.p.rapidapi.com/language"
params = {'text': text_n}
headers = {"X-RapidAPI-Key": key,
           "X-RapidAPI-Host": "mtnfog-cloud-nlp-v1.p.rapidapi.com",
          }

In [70]:
response = requests.get(url, headers=headers, params=params)

In [71]:
response.json()

'en'

So that's rubbish.