# More on requests

There are many services available on the web for free 
provided you know how to access them. Typically they 
are provided by major companies to show off the latest
Deep Learning technologies. 

- Google translate [BERT](https://fr.wikipedia.org/wiki/BERT_(mod%C3%A8le_de_langage)#:~:text=En%20traitement%20automatique%20du%20langage,en%20traitement%20automatique%20des%20langues.)
- Text to speech  [WaveNet](https://deepmind.com/blog/article/wavenet-generative-model-raw-audio)
- Image tagging [InceptionNet](https://demos.algorithmia.com/image-tagger)

In theory you can create DL models
by yourself in Colab but often:

- you don't have enough data
- the code that you want to run won't because of incompatibilities between versions.

So I often end up *hijacking* a service, there are 
- sometimes Python modules to access these ->like [gTTS](https://github.com/pndurette/gTTS)
- sometimes you have to access them **directly** via Requests using [query strings](https://en.wikipedia.org/wiki/Query_string)

Please read about query strings (chaînes de requête).

---

Below we will use :
- ```gTTS``` to access Google translates Text To Speech (TTS)
- ```requests``` to access IBM's TTS service at  https://text-to-speech-demo.ng.bluemix.net/

1. Google only has one voice per language for **free** but it is easier to access
1. IBM has many voices but I had to find what the request string was by using Developer Tools in Chrome

You can use [Developer Tools](https://developers.google.com/web/tools/chrome-devtools/network) too 
and I will help you if you need but it is an **advanced topic** so I won't teach it.

---


# gTTS

You can install gTTS like this

In [1]:
! pip install gTTS

Collecting gTTS
  Using cached gTTS-2.2.1-py3-none-any.whl (24 kB)
Installing collected packages: gTTS
Successfully installed gTTS-2.2.1


# How to use it

I save to an ```mp3``` then click to play.

In [3]:
from gtts import gTTS

# (2) Create an instance - tts : text to speech

input_text = 'Hello World'
tts = gTTS(text=input_text, lang='en', slow=True)

###### _Parameters:_  
# * `text` - String - Text to be spoken.  
# * `lang` - String - [ISO 639-1 language code](#lang_list) (supported by the Google _Text to Speech_ API) to speak in.  
# * `slow` - Boolean - Speak slowly. Default `False` (Note: only two speeds are provided by the API).  

# (3) Write to a file
tts.save('hello_world.mp3')

In [None]:
# IBM

In [9]:
# IBM

import requests

txt = 'hello how are you'
actor = 'en-GB_KateV3Voice'

url = 'https://text-to-speech-demo.ng.bluemix.net/api/v3/synthesize'
params = {'text' : txt,
                      'voice' : actor,
                      'download' : 'true',
                      'accept' : 'audio/mp3'
            }
         
r = requests.get(url, params=params)

with open('ibm.mp3','wb') as fp:
    fp.write(r.content)

# Inspecting the request

When we do ```requests.get``` we send
- headers
- a url with an encoded query string which contains the things from the dictionnary  ```params```

There is an important entry in ```headers``` that you should set 
so that websites don't know you are a robot it's [user-agent](https://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python).
Often I get this from the Developer Tools in Chrome.
Here is what Chrome sends when I look at this page
https://github.com/mrolarik/gTTS-google-text-to-speech/blob/master/gTTS%20-%20Thai%20language.ipynb

```GET /mrolarik/gTTS-google-text-to-speech/blob/master/gTTS%20-%20Thai%20language.ipynb HTTP/1.1
Host: github.com
Connection: keep-alive
Cache-Control: max-age=0
DNT: 1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
....
```

In [11]:
r.headers

{'X-Backside-Transport': 'OK OK', 'Connection': 'Keep-Alive', 'Transfer-Encoding': 'chunked', 'Date': 'Fri, 11 Dec 2020 08:09:12 GMT', 'Strict-Transport-Security': 'max-age=15552000; includeSubDomains', 'X-Content-Type-Options': 'nosniff', 'X-Dns-Prefetch-Control': 'off', 'X-Download-Options': 'noopen', 'X-Frame-Options': 'SAMEORIGIN', 'X-Ratelimit-Limit': '4', 'X-Ratelimit-Remaining': '3', 'X-Ratelimit-Reset': '1607674183', 'X-Xss-Protection': '1; mode=block', 'X-Global-Transaction-ID': 'cb47d0745fd329288182c0ef'}

In [12]:
r.url

'https://text-to-speech-demo.ng.bluemix.net/api/v3/synthesize?text=hello+how+are+you&voice=en-GB_KateV3Voice&download=true&accept=audio%2Fmp3'

Everything after ```?``` comes from ```params```

---
# A class Voices()

This is a kind of complicated interaction and it is expensive in time
because we have to wait on the IBM server replying so I wrote a class to do this.

you use the class like this 


In [31]:
speek = Voices() # create an instance
speek.add([('K','Imagination is more important than knowledge.')] ) # choose a voice and a text


Kimagination_is_more.mp3
skipping imagination_is_more.mp3
DONE


In [37]:
speek.add([('K','Imagination is more important than knowing.')] )

Kimagination_is_more.mp3
skipping imagination_is_more.mp3
DONE


- it takes a list of pairs  ```(actor,txt)``` where actor is a letter see below
- it checks to see if the mp3 already exist so that it saves time by not doing it again

It saves what has been done as JSON and reads it the next time.

## Exo 
I haven't been very careful about how I chose file names and keys. **Fix this**.

In [34]:
speek.inventory

{'Kimagination_is_more.mp3': 'Imagination is more important than knowing.'}

In [36]:
import os, re, time, sys
import subprocess

import json #serialise
import requests
from gtts import gTTS

class Voices():
    '''my class to read texts'''
    voices = {'K' : 'en-GB_KateV3Voice',
              'M' : 'en-US_MichaelV3Voice',
              'KK' : 'en-US_KevinV3Voice',
              'LI' : 'zh-CNLiNaVoice',
              'O' : 'en-US_OliviaV3Voice',
              'R' : 'fr-FR_ReneeV3Voice'
             }
    
    def __init__(self):
        if not os.path.isfile('script.json'):
            self.inventory = {}
        else:
            self.inventory = json.load(open('script.json', 'r'))

    def string2fn(self, xx):
        #you should make a better choice than me !!!
        '''hash function
        strip punctuation
        return first 3 words with sep=_'''
        words = re.sub(r'[^\w\s]', '', xx).lower().split() #strip punctuation - > lowercase
        #check and pad
        if len(words) < 3:
            words.extend(['blah']*3)
        return '_'.join(words[:3]) + '.mp3'

    def get_audio(self, to_say):

        actor, txt = to_say
        FN = self.string2fn(txt)
        print('Doing', FN)
        
        if actor in self.voices:
            url = 'https://text-to-speech-demo.ng.bluemix.net/api/v3/synthesize'
            params = {'text' : txt,
                      'voice' : self.voices[actor],
                      'download' : 'true',
                      'accept' : 'audio/mp3'
            }
         
            r = requests.get(url, params=params)

            with open('%s'%FN, 'wb') as FP:
                FP.write(r.content)

        else: #assume it's a language tag and ask google
            tts = gTTS(txt, lang=actor.lower())
            tts.save(FN)

    def add(self, txts):

        for tt in txts:
            actor, lines = tt
            FN = self.string2fn(lines)
            key = actor + FN
            print(key)
            if key in self.inventory and self.inventory[key] == lines:
                print('skipping', FN)
                continue
            self.inventory[key] = lines
            self.get_audio(tt)
            time.sleep(20)

        with open('script.json', 'w') as FP:
            json.dump({actor + self.string2fn(lines) : lines for  actor,lines  in txts}, FP)
        print('DONE')
        
    def __repr__(self):
        return str('\n'.join(self.inventory.keys()))

# Exo :

this works

In [38]:
pp = re.compile('station_24\.php\?id=(\d+)"><b>(.*?),(.*?)m')
user_agent = {'User-agent': 'Mozilla/5.0'}

url = 'http://romma.fr/frame_station24.php'

r = requests.get('http://romma.fr', 
                 headers=user_agent)

stations = pp.findall(r.text)

but this doesn't - **fix it**

In [42]:

url = 'http://romma.fr/carte.php'

#https://stackoverflow.com/questions/38489386/python-requests-403-forbidden

params = {'dept' : 0, 
          'param': 'temperature',
          'mobile' : 0,
          'carteinterne': 0}
  
r = requests.get(url, 
                headers=user_agent,
                params=params)

IndentationError: unexpected indent (<ipython-input-42-75d14ea38654>, line 11)

# Exo 

[Here](./einstein.txt) are some quotations to start with.

Use [this](https://github.com/ssut/py-googletrans) to translate quotations :

- to french
- back to english




