Skip to content

Web Application

Taichen Rose edited this page May 24, 2021 · 10 revisions

Web Application

The Web Application module is what the user will use to interact with our Tex2Speech application. Its function is to allow users to upload LaTeX (.tex) files and/or Bibliography (.bib) files, allow the user to configure basic pronunciation settings, feed the files to TexParser, and facilitate Amazon Web Service usage, and finally display to the user a download page, in which they can download the resulting audio to their files. The application itself is built using the Flask Web Framework which is a lightweight Python framework for building web applications.

Glossary

Speech Synthesis Markup Language (SSML): XML-based markup language for speech synthesis applications.

Amazon Web Services (AWS): Provides users with access to cloud computing platforms, APIs to use, and more. This team uses AWS since we decided to use Amazon Polly as our speech synthesizer which renders Speech Synthesis Markup-Langauge (SSML) format.

Elastic Beanstalk: Elastic Beanstalk is a service for deploying applications. It creates everything such as load balancing, scaling groups, security groups, etc for hosting your application on the cloud.

Amazon Polly: AWS's speech synthesizer, converts text into spoken audio. Supports SSML tags which allow the user to customize their pronunciation.

S3 Bucket (S3): Simple Storage Service, this must be used with Amazon Polly since the function we call to synthesize our marked-up files will dump the outputting audio file into an S3 bucket.

Web Application Architecture & Flow

The general flow of the Web Application Architecture:

Web Application Architecture

Walking through the diagram, the user will interact with our web application that is being hosted with AWS Elastic Beanstalk. When the user uploads their necessary documents for conversion, they have the option to configure specific pronunciation. This will be discussed in greater detail in another portion of this wiki, in short, the configuration allows the user to customize how their document is read. For example, a user can change the voice of who is reading their document, or they can choose how bold tags in LaTeX are spoken (with emphasis or an increased rate, etc).

When the user submits their documents and configuration choices (they can leave the configuration choices empty if they choose), the files get served off to some functionality in the backend that will sort the files, change the configurations (based on the user's choices if any were specified), and then feed the files to TexParser. TexParser will output the corresponding marked-up files in SSML. The web application will take these files and then feed them to Amazon Polly, which is our speech synthesizer. Amazon Polly will send the outputting audio files into an S3 bucket for storage and will create a temporary pre-signed URL. The web application will direct the user to the download page, so the user can temporarily access the S3 bucket and grab their audio file using the pre-signed URL.

Files

application.py: This is the main file that runs our Flask application, you can run this file by typing python3 application.py if all dependencies have been imported.

index.html: application.py calls this HTML file and renders it for the user. In the application.py file we dynamically add arrays of information and feed it into the index.html file. This is for the configuration choices for the user. On the index.html page, users are able to add LaTeX (.tex) files, Bibliography (.bib) files, Zip (.zip) files, and Tar (.tar.gz) files. After uploading the files, and adding in the configurations (optional), we handle the upload. There are functions that separate each file into separate arrays.

In application.py we facilitate and separate out each file. Add main LaTeX files into one array. (We classify a main LaTeX file if it has \begin{document} and \end{document}, input files (we classify input files as a .tex file that does not have \begin{document} and `\end{document}), add bib files into a bib array, and then traverse all zip/tar files that were uploaded to find corresponding main files, input files, or bib files and add them to their corresponding arrays.

aws_polly_render.py: After all the files were facilitated in their separate arrays, aws_polly_render will format all the files by calling functions from the format_master_files.py. This function adds corresponding input files into the main files and deals with certain things such as handling comments, percentages, etc.

From format_master_files.py, the files are given back to aws_polly_render.py then given to TexParser to be converted into SSML. There are some other small file calls prior to TexParser like expand_labels.py and doc_preprocess.py. In short, expand_labels looks into \label commands and \ref commands while doc_preprocess processes all files and gets rid of commands that may not suit well with TexParser.

TexParser will output the newly made marked-up SSML files and give them back to aws_polly_render. From here, we will start calling Amazon Polly to synthesize our files, and store them in the S3 Bucket. At the same time, we generate the pre-signed link here so the user can access their files.

Returning to application.py with the audio links, we render the download.html page for the user to get their links.

Agile Format

User Stories

As a researcher, I want to access Tex2Speech from any device with a web browser so I can conveniently convert LaTeX documents to audio from anywhere.

As a researcher, I want to use a file browser to upload LaTeX documents of my choosing so I can conveniently convert LaTeX documents to audio from anywhere.

As a researcher, I want to be able to upload multiple files at a time along with .bib files for external bibliographies so I can easily get multiple audio outputs for multiple documents at a time.

As a researcher, I want to receive a clean error response upon submitting an incorrect file format so I can submit a correct one or correct my errors.

As a researcher, I want to be able to download my audio file created from my document so I can review and listen to it.

As a researcher, I want to be the only one who can access my audio file so I don't have to worry about privacy and security.

As a researcher, I want my upload time to be fast so I won't be waiting forever to get my audio files

User Story Description

Users want to have a website that allows them to upload any LaTeX document from their local machines along with corresponding bibliography (.bib) files, and corresponding input files. Once uploading and submitting their LaTeX document, it is expected that the user will be able to then download the resulting audio to that LaTeX document.

Only visible .tex files should be able to be uploaded, any other file for example pdf, img, should not be allowed to be submitted. The user that uploads their LaTeX document should be the only user who can access the audio.

Design

This web application will have two different actions that users can partake in. First, they can upload LaTeX files. Secondly, they can download the resulting audio file. The framework for this application will be using Flask. Flask is a third-party Python library that allows developers to create web applications. Making the decision to use strictly python for our parsing implementation, we chose to utilize python packages to easily add our python scripts to our web application.

In creating the upload and download pages we used basic HTML, CSS, and JavaScript with Flask. Behind this, we will use python scripts to facilitate and convert the file into an audio file.  For uploading, we will verify that a .tex file has been uploaded, and only a .tex file will be submitted for processing along with corresponding bibliography files or input files. Once we get the marked-up files from the LaTeX parser, the web application will feed them through Amazon Polly, a speech synthesizer, and will upload the audio file to an S3 bucket.

Downloading the file requires a pre-signed URL which was generated by the S3 bucket. The S3 bucket is a space on AWS that allows us to upload files and data. Users will be able to access this bucket using the pre-signed URL. A pre-signed URL, is like a virtual key to the individual, allowing them to access the S3 bucket, and look or download their specific audio file. This pre-signed URL expires after a given amount of time, which will also then delete the audio file object from the bucket at expiration. Once the upload process through the LaTeX parser and Amazon Polly has finished, the user will be directed to a new page, allowing them to download their project audio.

Future Work

  • The images on the web app are png files. Future developers should remake these images in svg format so that they look better on all devices and browsers.
  • The design of the options menu could be refactored so that it automatically creates a new item in the table for each item in the config YAML file.

Known Web Application Errors

Create Separate Upload Folders for Each User

For testing purposes and simple development, we've had a single upload file. For the future to allow users to use the same application, it is necessary to create separate folders for each user. When a user uploads documents, the documents will be in correspondance to that file only.