Skip to content

Latest commit

 

History

History
32 lines (21 loc) · 1.59 KB

README.md

File metadata and controls

32 lines (21 loc) · 1.59 KB

Wikisource Book Analyzer

The code was written within 2 days as a part of my recruiting process at SAP in Nov. 2018.

The Task

  • Analyze the book “War and Peace” by Leo Tolstoy with means of Data Science and Machine Learning algorithms.

  • Please use the document „War and Peace“ by Leo Tolstoy at https://en.wikisource.org/wiki/War_and_Peace/Book_One and choose and download the text version using the „Choose format“ link on the left side.

  • Your analysis has to be repeatable. That means the script/program used for the analysis can be applied to the document in question and produce the same results. If you use additional sources in your analysis, please document those sources and their retrieval.

  • You can pick a development environment and programming language of your choice.

  • The only requirement is that you can explain your analysis in 25 minutes (including questions and answers).

  • Data Science often doesn’t start with clear questions. The following is a list of ideas what could be analyzed.

    • Word and phrase distribution.
    • Identify all the places and characters in the book.
    • Identify sentiments of the different sentences and chapters.
    • Summarize paragraphs.
    • Extract intents from sentences.
    • Track key concepts throughout the book.

Some pictures

alt text


alt text

alt text

alt text