survivoR2py: Survivor data for Python users

About the project

The code in this repository converts data files in an R package devoted to the Survivor television series from .rda to .csv formats so that Python users can enjoy them. There are also scripts here for downloading and parsing show transcripts. This latter process isn't fulled baked and is living here until it finds a home elsewhere.

Sources

The data comes from the survivoR project created by David Ohm, et al. They have organized and created numerous detailed and useful datasets about the history of the show, including an episode summary, castaway listing, challenge results, and vote history. Transcripts come from subslikescript and CBS/Paramount.

Processes

These scripts process data.

Convert survivoR data

scripts/convert_data.py: This script converts the survivoR data by fetching all the latest .rda files from the source, storing copies locally in data/raw/rda, and converting them to comma-delimited text files in data/raw/csv.
- See the original repo for metadata about the individual files.
- Note: Other than the format change, the content of data downloaded, processed, and stored in the raw directory will remain unchanged from the original repo.

Fetch transcripts

scripts/fetch_transcripts.py: This script collects all episode transcript URLs, converts the URLs to metadata (episode number, season, episode title, URL, etc.), fetches the full transcript for each episode, and parses the text for what contestants said after Jeff's famous line, "The tribe has spoken." All of it is stored in a dataframe and exported to CSV and JSON. The goal is to refine the dataset enough so it might be useful to offer back to the survivoR folks.

Questions? Corrections?

Please let me know.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
notebooks		notebooks
scripts		scripts
visuals		visuals
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

survivoR2py: Survivor data for Python users

About the project

Sources

Processes

Convert survivoR data

Fetch transcripts

Questions? Corrections?

Related

About

Releases

Packages

Languages

License

stiles/survivoR2py

Folders and files

Latest commit

History

Repository files navigation

survivoR2py: Survivor data for Python users

About the project

Sources

Processes

Convert survivoR data

Fetch transcripts

Questions? Corrections?

Related

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages