Screenplay Parser

This is a screenplay parser that extracts dialogues between characters. However it extracts the dialogues if the second character has a paranthetical. The scripts are crawled from http://www.imsdb.com/ .

Getting Started

Create a new environment
Clone the repository
Install the dependencies pip install -r requirements.txt
Run scrapy : Go to brickset-scraper folder and run this in your terminal:
```
 scrapy runspider scraper.py --output=data/names_links.json
```
This will generate data/names_links.json.
python json_parser.py data/names_links.json. This will read names_links.json and will create all_name_script.txt. This new txt file has a movie name and a link to its script for each movie in the json file. Note that each script takes 1-2 seconds.
python html_list_parser.py . This will read all_name_script.txt and will generate all_dialogues.txt. This file has all the relevant dialogues from the movie scripts.

Prerequisites

You need to have

Authors

Kamil Veli Toraman: kvtoraman

License

There is no licence for now. You can use as you please. This code tries to have a rule-based algorithm for movie scripts. If you have a better way, please inform me :)

Acknowledgments

This is a result of a 2 month internship in Data Science Lab, Kaist.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
html_list_parser.py		html_list_parser.py
json_parser.py		json_parser.py
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

html_list_parser.py

html_list_parser.py

json_parser.py

json_parser.py

requirements.txt

requirements.txt

scraper.py

scraper.py

Repository files navigation

Screenplay Parser

Getting Started

Prerequisites

Authors

License

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

kvtoraman/Screenplay

Folders and files

Latest commit

History

Repository files navigation

Screenplay Parser

Getting Started

Prerequisites

Authors

License

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Languages