Skip to content

Homework Notebooks of Art of Analyzing Big Data - The Data Scientist’s Toolbox.

Notifications You must be signed in to change notification settings

nevoit/Analyzing-Big-Data-Course

Repository files navigation

Analyzing-Big-Data-Course

The Art of Analyzing Big Data - The Data Scientist’s Toolbox.

Getting Started

  1. Get a free API Key at https://www.kaggle.com/ (My account -> Create New API Token) See a tutorial at https://github.com/Kaggle/kaggle-api#api-credentials.
  2. Add your API token by changing this line:
api_token = {"username":"","key":""}
  1. Run ech ipynb file using Colab (https://colab.research.google.com/) or Jupiter.

Examples

Assignment 1

Assignment 2

Assignment 3

Assignment 4

Assignment 5

Assignment 6

Task: Select a reviews dataset and create a sentiment classifier which utilizes word embeddings. Evaluate this classifier. Try to improve your classifier by adding additional features.

Dataset: Courseras Course Reviews Dataset

roc

Assignment 7

Task: Select two books and construct networks of people and locations, i.e. each link is between a a person and a location. Visualize the network using Cytoscape or Gephi.

Dataset: Dickens

Book Name: Little Dorrit

gephi

Assignment 8

Task: Select a country's' statistic from the World Development Indicators dataset (Please notice there are several files in the dataset, such as Indicators.csv). Then, create a choropleth map displaying how the selected statistics changed over time Create a short animation that displays how the chosen statistics changed over time

Dataset: World Development Indicators

word_exp

Assignment 9

Task: Select a short video with at least 3 persons and create a new movie from this video with a face tracker (each person’s face needs to be tracked by a rectangle of a different color). See, for example, the video in: https://github.com/ageitgey/face_recognition

Question 1

Task: Select a collection of connected images. Create a graph of the links among images or objects in image. Use graph algorithms to discover interesting insights regarding the images.

Dataset: simpsons-characters

Question 2

Assignment 10

Tools And Packages

  • Advanced SQL
  • Pandas
  • Matplotlib
  • TuriCreate
  • Seaborn
  • Altair
  • Networkx
  • iGraph
  • TuriCreate
  • Cytoscape
  • Gephi
  • D3
  • SpaCy
  • Topic Modeling
  • Word-Embedding (Word2Vec, BERT)
  • Sentiment Analysis
  • MongoDB
  • KeplerGL
  • AWS
  • Microsoft Azure
  • Google Cloud Platform
  • Spark
  • Hadoop
  • Dask and MLlib

About

Homework Notebooks of Art of Analyzing Big Data - The Data Scientist’s Toolbox.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published