# Website to Speech
Created by [Nuhu Ibrahim](https://nuhuibrahim.com)

## Project Goals
The goal of this project is to develop a python script that:
1. Converts a valid web URL into an image,
2. Automatically Describe the image using a caption, 
3. and coverts the caption into audio so that it could be read aloud and listened to. 

This project aims to assist visually impaired people in gaining a general overview of a website so that they can infer if the website is worth spending time on or not. It is generally known that it takes a long time for people with visual impairment to go through a webpage completely.

Appreciation to the [Public APIs Github Repository](https://github.com/public-apis/public-apis.git) as all the APIs used in this project was first found there.

### Part 0: Installing dependencies
This tasks will be achieved by taking advantage of some existing tools that are already solving the major subtasks in this project. Such tools include the "Restpack" screenshot that that captures a screenshot of any webpage with one API call, "Cloudmersive" that provides powerful deep learning image recognition and processing APIs for recognizing and processing images, and the "IBM" text to speech API that converts a given text to speech.

All the dependencies are installed or imported in the cell below:

In [1]:
!pip install cloudmersive-validate-api-client
!pip install cloudmersive-image-api-client
!pip install --upgrade "ibm-watson>=5.0.0"
!pip install playsound
!pip install -U PyObjC
!pip install pathlib

# Imports for restpack
import requests
import json

# Imports for cloudmersive
import cloudmersive_image_api_client

# Imports for IBM Text to Speech
from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

# Playing the sound
from pathlib import Path
from playsound import playsound

Requirement already up-to-date: ibm-watson>=5.0.0 in /opt/anaconda3/lib/python3.8/site-packages (5.1.0)
Requirement already up-to-date: PyObjC in /opt/anaconda3/lib/python3.8/site-packages (7.1)






### Part 1: Converting a website to an image
To achieve the conversion of website to image, the "Restpack" screenshot API is used. The Restpack website can be found [here](https://restpack.io/screenshot).

For this to work perfectly, you may need to create an account with restpack [here](https://restpack.io/console/register) and then copy your access token into the "x-access-token" value in the headers dictionary.

In [2]:
restpack_api_key = 'Your Restpack API Key'
website_to_predict = 'https://google.com'

headers = {
  'Content-Type': 'application/json',
  'x-access-token': restpack_api_key
}
payload = {
    'url'    : website_to_predict,
    'json'   : 'true',
    'width'  : '1280', 
    'height' : '768',
    'format' : 'jpg',
    'mode'   : 'viewport',
}
url = 'https://restpack.io/api/screenshot/v6/capture'

response = requests.post(url, headers = headers, params = {}, data = json.dumps(payload))

if  response.status_code != 200:
    print("Sorry, an error occureed while converting webpage to image using Restpack, please try again.")
else:
    response.raise_for_status()

    image_properties = response.json()

    image_location = image_properties['image']

    r = requests.get(image_location, allow_redirects=True)

    open('website-image.jpg', 'wb').write(r.content)

### Part 2: Automatically describing the image using a simple caption
To involve deep learning in recognizing and processing the images to enable the generation of some simple captions, the "Cloudmersive" API is used. The Cloudmersive website can be found [here](https://www.cloudmersive.com/image-recognition-and-processing-api).

To make use of this service, you need to first register from the Cloudmersive website [here](https://account.cloudmersive.com/signup) and then create an api key.

In [3]:
cloudmersive_api_key = 'Your Cloudmersive API Key'

# Configure API key authorization: Apikey
configuration = cloudmersive_image_api_client.Configuration()
configuration.api_key['Apikey'] = cloudmersive_api_key

# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed
# configuration.api_key_prefix['Apikey'] = 'Bearer'

# create an instance of the API class
api_instance = cloudmersive_image_api_client.RecognizeApi(cloudmersive_image_api_client.ApiClient(configuration))
image_file = "website-image.jpg" # file | Image file to perform the operation on.  Common file formats such as PNG, JPEG are supported.

prediction = ""

try:
    # Describe an image in natural language
    api_response = api_instance.recognize_describe(image_file)
    
    prediction = "Please listen to our prediction of your webpage. "
    
    if api_response.highconfidence == False:
        prediction = prediction + api_response.best_outcome.description + ". However, we are not very sure about this prediction."
    else:
        prediction = api_response.best_outcome.description
except Exception as e:
    prediction = "Sorry, an error occured while we were trying to guess the content of the page. Please try again."

### Part 3: Converting the caption into an audio so that it could be read aloud
To achieve the conversion of the simple convetion to speech, the "IBM" text to speech API is used. The IBM text to speech website can be found [here](https://cloud.ibm.com/docs/text-to-speech/getting-started.html). You need to first create acount on the IBM text to speech website [here](https://cloud.ibm.com/registration?target=%2Fdocs%2Ftext-to-speech%2Fgetting-started.html). 

In [4]:
ibm_text_to_speech_api_key = 'IBM API Key'
ibm_service_url = 'IBM Service URL'

authenticator = IAMAuthenticator(ibm_text_to_speech_api_key)
text_to_speech = TextToSpeechV1(
    authenticator=authenticator
)

text_to_speech.set_service_url(ibm_service_url)

try:
    with open('audio.wav', 'wb') as audio_file:
        audio_file.write(
            text_to_speech.synthesize(
                prediction,
                voice='en-US_AllisonV3Voice',
                accept='audio/wav'        
            ).get_result().content)
except Exception as ex:
    print("Sorry, we were unable to convert your website to speech, please try again.\n")

### Part 4: Playing the sound
The code below just plays the sound to the hearing of the visually impaired person.

In [5]:
path = str(Path("audio.wav").resolve())
path = path.replace(" ", "%20")

playsound(path)