# File Formats

There are thousands of file formats for different purpose. Like

- Audio files
- Binary files
- Database files
- Documents
- Image files
- Source codes
- Video files
- ...

In this workshop we will focus on markup and generic text based files used
all over the web.

## Materials & Resources

| Material                                                                                              | Time |
|:------------------------------------------------------------------------------------------------------|-----:|
| [Files & File Systems: Crash Course Computer Science #20](https://youtu.be/KN8YgJnShPM) (*till 4:45*) | 4:45 |
| [Text Files (Part 1) What is a text file?](https://youtu.be/H7R0LN41N8c)                              | 3:56 |
| [5 Minute Metadata - What is a CSV?](https://youtu.be/_blfh7uR05A)                                    | 4:42 |
| [XML Tutorial for Beginners \| What is XML \| Learn XML](https://youtu.be/KeLiQXqVgMI)                | 6:38 |
| [What is JSON? - 3 Minutes of Code](https://youtu.be/sSL2to7Jg5g)                                     | 2:33 |

## Material Review

- What is XML?
  <!--
    It stands for eXtensible Markup Language. A simple text file format to store
    and transport data.
  -->
- How the data is stored in XML files?
  <!--
    It is stored between an opening and a closing tag. More tags can follow each
    other so it can represent any complex structure. It is similar to HTML however
    in XML there are no predefined tags. It is flexible and customizable.
  -->
- Can we nest data in XML files?
  <!--
    Just like in HTML any tag can contain 0 or more other tags, so you can nest
    data.
    <employee>
      <name>...</name>
      <department>...</department>
    </employee>
  -->
- How can we add special information to the data in XML?
  <!--
    We can define attributes on the tags just like in HTML.
  -->
- What is the CSV format used for?
  <!--
    CSV can describe data as it would be a table. It has columns and rows.
    Columns are separated by a colon.
  -->
- What is the difference between CSV and TSV?
  <!--
    In TSV the columns are separated with tabs.
  -->
- Are there different types of CSV?
  <!--
    Sometimes we separate the columns with a semi-colon, to prevent the confusion
    when a value contains a colon.
  -->
- What is JSON?
  <!--
    JavaScript Object Notation, a widely used file format to transfer and store
    data. It comes from the JS object format.
  -->
- What are the valid data types in JSON?
  <!--
    Array, boolean, number, string, null, object
  -->
- What is the benefit/drawback of JSON over XML?
  <!--
    JSON is much shorter so it takes less space.
    XML can add metadata to the values.
  -->

## Workshop

### Find oldest movie

Read this [input file](./movies.csv) and print the title of the oldest movie.
The file has the following columns:

- Title
- Year
- Director

### Remove useless data

In [this file](./election.csv) you can find the raw data of a public election.
Unfortunately something went wrong and there are some row which cannot be used
(a value is missing). We need to remove these rows and then print them to the
console. Columns (mandatory fields are signed with *):

- Name *
- Candidate *
- Time
- State *

### Find the post with the most popular comments

You can find some posts and their comments in [this file](./posts.json). Now
you need to find the post which got the most popular comments. Most popular
comments mean the sum of the likes on the comments.

### USD transactions

In the [transactions.xml](./transactions.xml) you can find money transfers. Your
task is to filter all USD transactions and print them to the console in a user
friendly format.

### Exam performance

Here is a fictive [result](./exams.tsv) of an exam. The examiners have tracked
the user id, the result and time spent on the exam. There were no standard time
format so each mentor used their own. Now you need to find the user who has got
the most points within one min. Your task is to find the highest points/mins
ratio within the dataset.

### Transform data

- Transform [users.csv](./users.csv) into `json` and save it.
- Transform [flowers.json](./flowers.json) into `xml` and save it.