Skip to content

master-ds2e/NoSQL

Repository files navigation

Infrastructures de données: NoSQL

Instructor

Contact:

About me:

  • University degree at Strasbourg (Statistics and Econometrics).
  • PhD in Economics. (Bibliometrics, novelty, collaboration network, ...)
  • R since 2016 and Python 2018.
  • Working with SQL and noSQL databases.

Syllabus

The aim of this course is to teach students a way to store and process non-relational data. Depending on the size but also the problem at hand you will not use the same storage system. You will learn to choose the right format for the non-relational data you have. During the whole course we are going to use Python as our programming language. We start off with basic unstructured formats like json, xml, dictionnaries. Next we study the most currently used noSQL databases: MongoDB, Neo4j. We finish with a short presentation of other DBs.

The goal of the course is not to be exhaustive. Programming is a vast space of knowledge, sometimes you will do something I have never done before. In the end you will learn what I know and I will learn from your questions. This also means that I can't have the answers to all of your problems, basic behavior in programming is to first do research on your side. Websites like stackoverflow/stackexchange/Quora/Youtube/Github/... all of them are your best friends when it comes to solving issues. If after your own research you are still lost then you can send me a message about your problem. Either I will have the solution because I encountered it previously, or I don't but I can guide you to the solution. In no way you should ask me to code things for you, it's your job ! https://www.youtube.com/watch?v=HluANRwPyNo

Program

20H face to face lecture

  • Introduction

  • Chapter 1: SQL

  • Chapter 2: Basic data format that are non-relational

  • Chapter 3: MongoDB

  • Chapter 4: Neo4j

  • Chapter 5: Other alternatives.

Prerequisites

  • Prior knowledge in Python is required and familiarity with programming concepts.
  • A laptop connected to the internet and running Windows, Linux, MacOS
  • Anaconda or Visual Studio Code installed, see below. Choose one of the IDE (I'll be using Spyder and Jupyter-Notebook)
  • One text editor (Sublime, Atom, Vim, ...)

If you have little experience with Python or shell programming, the following tutorials may be helpful:

Preparation before the First Session

Preparation before Chapter I

Preparation before Chapter III

Preparation before Chapter IV

First you need to install java

Then install neo4j desktop

When everything is done you can launch Neo4j Desktop and you should be able to connect to the default db (Movie)

Grading

You will have a single grade at the end of the semestre

Dossier

Date: 26/04/2024. It's not a "dossier" per say:

  • In each chapter there's a couple of "todo" you'll send me these todos and will count as a bonus/malus towards the grade. This is an individual work and mainly to see your participation since most of them will be worked and discussed during the course.

  • At the end of Chapter II and III there are a couple of homeworks, you'll send me two choosen homework per chapter before the 23/04/2023. This is a group work (either solo or max 2 people) and will be your main grade.

  • I recommend using Jupyter-Notebook (more on that later).

Resources

About

The 2020-2021 NoSQL course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published