Skip to content

This tool fetches books from en.wikisource.org and analyzes them via Natural Language Processing.

Notifications You must be signed in to change notification settings

its-leo/Wikisource-Book-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikisource Book Analyzer

The code was written within 2 days as a part of my recruiting process at SAP in Nov. 2018.

The Task

  • Analyze the book “War and Peace” by Leo Tolstoy with means of Data Science and Machine Learning algorithms.

  • Please use the document „War and Peace“ by Leo Tolstoy at https://en.wikisource.org/wiki/War_and_Peace/Book_One and choose and download the text version using the „Choose format“ link on the left side.

  • Your analysis has to be repeatable. That means the script/program used for the analysis can be applied to the document in question and produce the same results. If you use additional sources in your analysis, please document those sources and their retrieval.

  • You can pick a development environment and programming language of your choice.

  • The only requirement is that you can explain your analysis in 25 minutes (including questions and answers).

  • Data Science often doesn’t start with clear questions. The following is a list of ideas what could be analyzed.

    • Word and phrase distribution.
    • Identify all the places and characters in the book.
    • Identify sentiments of the different sentences and chapters.
    • Summarize paragraphs.
    • Extract intents from sentences.
    • Track key concepts throughout the book.

Some pictures

alt text


alt text

alt text

alt text

About

This tool fetches books from en.wikisource.org and analyzes them via Natural Language Processing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages