# Parsing Metadata from Filenames: Working with Strings, Lists, and Dicts

## Key-Value Mappings: Dictionaries

| Code | Description |
| :-- | :-- |
| data = {} | Make an empty Dict | 
| data = {'a': 3, 'b': 5} | Make a Dict with two items: "a" and "b" |
| data['a'] |  |
| data['c'] = 7 |  |
| list(data.keys()) |  |




**Exercises**

The `image` dict describes how researcher Tom's recording is formatted:

In [None]:
image = {'height': 1920, 'width': 1080, 'format': 'RGB', 'order': 'F'}
image

{'height': 1920, 'width': 1080, 'format': 'RGB', 'order': 'F'}

Write the code to print out the width of the image, by accessing the `"width"` key:

In [None]:
image['width']

1080

What is the height of the image?

In [None]:
image['height']

1920

How are the pixel data in the image formatted?

In [None]:
image['format']

'RGB'

What happens if you use the same approach to find out which key has the value `1080`?  What does this tell you about how key-value maps like Dictionaries are designed for?

In [None]:
image[1080]

NameError: name 'image' is not defined

Make a dictionary: Reorganize the code below: tell Python that the three variables below all belong together by putting them into a dictionary called `session`.

In [None]:
subject = "Josie"
date = "2023-07-23"
group = "control"

session = {'subject': subject, 'date': date, 'group': group}
session

{'subject': 'Josie', 'date': '2023-07-23', 'group': 'control'}

Check that the dictionary is constructed properly by getting the subject from it. It should show "Josie"

In [None]:
session['subject']

'Josie'

In [None]:
default_session = {'subject': 'Ken', 'experimenter': 'Barbie', 'time': '09:00', 'notes': 'Nothing new.'}
today_vars = {'subject':  'Allan', 'notes': 'Did a good job.'}


{'subject': 'Allan',
 'experimenter': 'Barbie',
 'time': '09:00',
 'notes': 'Did a good job.'}

In [None]:

session1 = default_session | today_vars
session1

{'subject': 'Allan',
 'experimenter': 'Barbie',
 'time': '09:00',
 'notes': 'Did a good job.'}

## Extracting Metadata from strings

| Code | Description |
| :--- | :--- |
| **Indexing by Position (i.e. "Slicing" a String)** |   |
| bonn = "BonnKölnAachen"[:4] |  |
| köln = "BonnKölnAachen"[4:8] |  |
| aach = "BonnKölnAachen"[8:] |  |

**Exercises**

This researcher had a rule for her filenames: she would store session metadata in **fixed-length** strings, with information always in the same order:
  - **Subject Name**: 6 Characters
  - **Date**: 8 Characters
  - **Treatmet Group**: 7 Characters:
  - **Session Number**: 5 Characters ("sess" and then the number)

That way, when she later needed the information, she could extract it from the filename just by slicing it!

What subject's data is in this file?

In [None]:
fname = "Arthur20241008controlsess1.txt"   # Filename convention: Subject, Date, Group, Session
fname[:6]

'Arthur'

What group is this subject in?

In [None]:
fname = "Arthur20241008controlsess1.txt"   # Filename convention: Subject, Date, Group, Session
fname[14:21]

'control'

What Session number was this?  Turn it from a string into an int with the `int()` function.

In [None]:
fname = "Arthur20241008controlsess1.txt"   # Filename convention: Subject, Date, Group, Session
fname[]

Extract all four metadata variables from the following file and put them into their own variables (note that the subject has fewer than 6 characters in their name.  After slicing the data, you can replace the underscore characters with "empty strings" by using the `replace()` method on strings (e.g. `"name__".replace('_', '')`):

In [None]:
fname = "Joe___20241009experimsess1.txt"  # Filename convention: Subject, Date, Group, Session
subject, date, group, sess = fname[:6].replace('_', ''), fname[6:14], fname[14:21], int(fname[25])
subject, date, group, sess

('Joe', '20241009', 'experim', 1)

Make a dictionary with the keys "Subject", "Date", "Group", and "SessionNum" with the data from this filename:

In [None]:
fname = "Arthur20241008controlsess1.txt"   # Filename convention: Subject, Date, Group, Session
session = {
    "Subject": fname[:6], 
    "Date": fname[6:14], 
    "Group": fname[14:21],
    "SessionNum": int(fname[25]),
}
session

{'Subject': 'Arthur', 'Date': '20241008', 'Group': 'control', 'SessionNum': 1}

Building a table of metadata usually has the following steps, which can be done in a loop:

1. Extract data into a dictionary
2. Append the dictionary into a list of dictionaries
3. Change the list of dictionaries into a data frame (the table)

**Example**: Fill in the missing data extraction code for the filenames below to make a session table.  Include the original filename in its own column, to make finding the file later simpler:

In [None]:
fnames = ["a2.txt", "b3.txt"]

In [None]:
import pandas as pd

all_sessions = []
for fname in fnames:
    session = {
        "Letter": fname[0],
        "Number": int(fname[1]),
        "Filename": fname,
    }
    all_sessions.append(session)

df = pd.DataFrame(all_sessions)
df

Unnamed: 0,Letter,Number,Filename
0,a,2,a2.txt
1,b,3,b3.txt



**Exercise**: Fill in the missing data extraction code for the filenames below to make a session table. Include the original filename in its own column, to make finding the file later simpler:


In [None]:
fnames = ["Arthur20241008controlsess1.txt", "Joseph20241009controlsess1.txt", "Arthur20241010treatmesess2.txt", "Joseph20241011controlsess2.txt"]
fnames

['Arthur20241008controlsess1.txt',
 'Joseph20241009controlsess1.txt',
 'Arthur20241010treatmesess2.txt',
 'Joseph20241011controlsess2.txt']

In [None]:
all_sessions = []
for fname in fnames:
    session = {
        "Subject": fname[0:6],
        "Date": fname[6:14],
        "Group": fname[14:21],
        "SessionNum": int(fname[25:26]),
        'Filename': fname,
    }
    all_sessions.append(session)

df = pd.DataFrame(all_sessions)
df

Unnamed: 0,Subject,Date,Group,SessionNum,Filename
0,Arthur,20241008,control,1,Arthur20241008controlsess1.txt
1,Joseph,20241009,control,1,Joseph20241009controlsess1.txt
2,Arthur,20241010,treatme,2,Arthur20241010treatmesess2.txt
3,Joseph,20241011,control,2,Joseph20241011controlsess2.txt
