# Dataformater og datastrukturer

## Data -> Datasæt -> Datastruktur

![data-structure](../img/data_set_structure.png)

## Relationel datastruktur

- Data i tabeller
- Hver tabel har kun 2 dimensioner (rækker og kolonner)

![hierarch](../img/expanse_relational.png)

## Hierarkisk datastruktur

- Data i træ-lignende struktur
- Ikke bundet til 2 dimensioner (hver gren kan have så mange undergrene, som  der er behov for)

![hierarch](../img/expanse_hierarchical.png)

## Liste af lister

In [33]:
names = ["James", "Naomi", "Alex", "Clarissa"]
ethnicities = ["earther", "belter", "martian", "earter"]
vocation = ["captain", "technician", "pilot", "mechanic"]

data = [names, ethnicities, vocation]

In [34]:
data

[['James', 'Naomi', 'Alex', 'Clarissa'],
 ['earther', 'belter', 'martian', 'earter'],
 ['captain', 'technician', 'pilot', 'mechanic']]

In [35]:
[items[0] for items in data]

['James', 'earther', 'captain']

## Dictionary (JSON)

In [36]:
data = {
    "James": {"ethnicity": "earther", "vocation": "captain"},
    "Naomi": {"ethnicity": "belter", "vocation": "technician"},
    "Alex": {"ethnicity": "martian", "vocation": "pilot"},
    "Clarissa": {"ethnicity": "earther", "vocation": "mechanic"},
}

In [37]:
data

{'James': {'ethnicity': 'earther', 'vocation': 'captain'},
 'Naomi': {'ethnicity': 'belter', 'vocation': 'technician'},
 'Alex': {'ethnicity': 'martian', 'vocation': 'pilot'},
 'Clarissa': {'ethnicity': 'earther', 'vocation': 'mechanic'}}

In [38]:
data["James"]

{'ethnicity': 'earther', 'vocation': 'captain'}

## Liste af dictionaries (JSON records)

In [45]:
data = [
    {"name": "James", "ethnicity": "earther", "vocation": "captain"},
    {"name": "Naomi", "ethnicity": "belter", "vocation": "technician"},
    {"name": "Alex", "ethnicity": "martian", "vocation": "pilot"},
    {"name": "Clarissa", "ethnicity": "earther", "vocation": "mechanic"},
]

In [46]:
data

[{'name': 'James', 'ethnicity': 'earther', 'vocation': 'captain'},
 {'name': 'Naomi', 'ethnicity': 'belter', 'vocation': 'technician'},
 {'name': 'Alex', 'ethnicity': 'martian', 'vocation': 'pilot'},
 {'name': 'Clarissa', 'ethnicity': 'earther', 'vocation': 'mechanic'}]

In [40]:
data[0]

{'name': 'James', 'ethnicity': 'earther', 'vocation': 'captain'}

## Data frame

In [41]:
import pandas as pd

data = pd.DataFrame.from_records(data)

In [42]:
data

Unnamed: 0,name,ethnicity,vocation
0,James,earther,captain
1,Naomi,belter,technician
2,Alex,martian,pilot
3,Clarissa,earther,mechanic


In [43]:
data.loc[0, :]

name           James
ethnicity    earther
vocation     captain
Name: 0, dtype: object

## Hyppige dataformater: .csv

- csv: "Comma-separated values"
- Én række per observation
- Værdier adskilt med kommaer
- To-dimensionel datastruktur

```
name,ethnicity,vocation
James,earther,captain
Naomi,belter,technician
Alex,martian,pilot
Clarissa,earther,mechanic
```

## Hyppige dataformater: .json

- Hierarkisk dataformat
- Værdier i nøgle-værdi par
- Flere "grene" kan tilføjes

```
{'Nemesis Games': {
    "Characters": {
        "James Holden": {
            "Occupation": "Captain", 
            "Ethnicity": "Earther", 
            "Ship": {
                "Name": "Rocinante",
                "Class": "Corvette",
                "Owner": "James Holden",
                "Crew-size": 6
            }
        },
        "Naomi Nagata": {
            "Occupation": "Technician", 
            "Ethnicity": "Belter",
            "Children": {
                "Filip Nagata": {
                    "Name": "Filip Nagata",
                    "Occupation": "Unknown"
                }
            }
        }
    }
}
```

# Opsummering

- Data kan struktureres i forskellige formater og strukturer
- *Relationel* datastruktur er data i tabeller, som er begrænset til to dimensioner
- *Hierarkisk* datastruktur er data i træ-lignende struktur