Nexis_uni_parser

Parser for Nexis Uni documents

Description

This repository provides functions to parse DOCX files retrieved from Nexis Uni. It parses the following information:

Date
Author
Title
Body Text
Newspaper

Additionaly, it returns the sentiment probabilities (neutral, positive, negative) from the article body text.

How it works

News articles from Nexis Uni are downloaded in a format (DOCX, RTF, or PDF) that is not ideal for quantitative analyses. Files can contain up to 100 news articles like the following one:

Article (example)

Two DOCX examples can be find in the /data folder:

/data/example1.DOCX
/data/example2.DOCX

The function etl_all_docs in nexis_uni.py takes a folder path, reads all .DOCX files and parses all the relevant information described in the description. The data is returned in a csv file.

Example: We want to read all DOCX files in the folder ~/data and store the resulting csv as parsed_articles.csv in the same folder. We can execute the following code:

path_input = "~/data"
file_path_output = "~/data/parsed_articles.csv"

etl_all_docs(path_input, file_path_output)

Resulting CSV (example)

Potential Issues

Please note that there might be some issues with dates formatting and additional work might be needed. Different media outlets provide different date formats.
Sentiment classification is only available in Spanish at the moment but the code can be easily changed to other languages (check the helper function get_text_analytics)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
LICENSE		LICENSE
README.md		README.md
nexis_uni.py		nexis_uni.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nexis_uni_parser

Description

How it works

Article (example)

Resulting CSV (example)

Potential Issues

About

Releases

Packages

Languages

License

seolub/Nexis_uni_parser

Folders and files

Latest commit

History

Repository files navigation

Nexis_uni_parser

Description

How it works

Article (example)

Resulting CSV (example)

Potential Issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages