### Exercise


Given a zip file with a subfolder with multiple annotations, where the name convention for each one of them is: 

{DATE}_{TIME}_SN{SATELLITE_NUMBER}_QUICKVIEW_VISUAL_{VERSION}_{UNIQUE_REGION}.txt

where:

- DATE expressed as YYYYMMDD (year, month and day), e.g. 20241201, 20230321 ...
- TIME expressed as HHMMSS (hour, minutes and seconds), e.g. 2134307
- SATELLITE_NUMBER an integer that represents the satellite number.
- VERSION provides the version of the pipeline, e.g. "0_1_2", "1_3_1" ...
- UNIQUE_REGION provides a unique location in the form of a string, e.g SATL-2KM-10N_552_4164

Find out the following thing about your data:

1. How many files the annotations folder has.
2. How many of them follow the name convention expressed above.
3. How many of annotations you have per month and year. Which month has more annotation files.
4. Create a new annotations folder with multiple folders corresponding to a month.
5. Print all the annotations from the most recent to the oldest one. 
6. How many different satellites there are, how many annotations we have per satellite number, and which one was used in the most recent annotation file. 
7. How many unique regions there are.

some tips:
- str class has a method called split, you can use it to get each field per annotation.
- you can use sort from numpy on strings.

In [13]:
# Read the files in the folder annotations and print the number of files in the folder.
import os 
annotations = os.listdir('/Users/jonat/Library/Mobile Documents/com~apple~CloudDocs/VSCoding/Esade/pythonDS/session_4/annotations/')
print(f"Annotations folder has {len(annotations)} files")

Annotations folder has 206 files


In [14]:
import re

# Pattern of correct filenames
pattern = r'^\d{8}_\d{6,7}_SN\d+_QUICKVIEW_VISUAL_\d+_\d+_\d+_[\w-]+\.txt$'
correctAnnotations = 0
filename = '20241201_2134307_SN42_QUICKVIEW_VISUAL_1_3_1_SATL-2KM-10N_552_4164.txt'


for filename in annotations:
    # Check if the filename matches the pattern
    if re.match(pattern, filename):
        # Increment the counter
        correctAnnotations += 1
    else:
        continue

# Print the number of correct annotations
print(f"There are {correctAnnotations} correct annotations")

There are 194 correct annotations


In [15]:
from datetime import datetime

dateCounts = {}
# Loop through the filenames and count the number of annotations per month
for filename in annotations:
    # Split the filename by underscore
    splits = filename.split("_")
    date = splits[0]
    # Convert the date string to a datetime object
    date_obj = datetime.strptime(date, '%Y%m%d')

    # Extract the month and year
    month_str = date_obj.strftime('%B')
    year_int = date_obj.year

    # Combine the month and year
    combined = month_str + " " + str(year_int)

    # Check if the combined month and year is in the dictionary and increment the count
    if combined in dateCounts:
        dateCounts[combined] += 1
    else:
        dateCounts[combined] = 1

# Print the number of annotations per month
for key, value in dateCounts.items():
    print(f"{key} {value}")

# Find the month with the most annotations
max_key = max(dateCounts, key=dateCounts.get)
print(f"The month with the most annotations is {max_key}")

January 2024 27
June 2024 52
April 2024 37
February 2024 45
March 2024 17
May 2024 28
The month with the most annotations is June 2024


In [16]:
from datetime import datetime
import shutil

# Check if the folder annotations exists and delete it if it does
if os.path.exists('/Users/jonat/Library/Mobile Documents/com~apple~CloudDocs/VSCoding/Esade/pythonDS/Week4/annotations'):
        shutil.rmtree('/Users/jonat/Library/Mobile Documents/com~apple~CloudDocs/VSCoding/Esade/pythonDS/Week4/annotations')
    
# Create the folder annotations in Week4 directory and change the working directory to annotations
os.chdir('/Users/jonat/Library/Mobile Documents/com~apple~CloudDocs/VSCoding/Esade/pythonDS/Week4')
os.mkdir('annotations')
os.chdir('annotations')


dateCounts = {}
for filename in annotations:
    splits = filename.split("_")
    date = splits[0]
    date_obj = datetime.strptime(date, '%Y%m%d')

    month_str = date_obj.strftime('%B')
    year_int = date_obj.year

    combined = month_str + " " + str(year_int)

    # Check if the combined month and year is an existing directory and copy the file to the directory
    if os.path.exists(f'/Users/jonat/Library/Mobile Documents/com~apple~CloudDocs/VSCoding/Esade/pythonDS/Week4/annotations/{combined}'):
        shutil.copy(f'/Users/jonat/Library/Mobile Documents/com~apple~CloudDocs/VSCoding/Esade/pythonDS/session_4/annotations/{filename}', f'/Users/jonat/Library/Mobile Documents/com~apple~CloudDocs/VSCoding/Esade/pythonDS/Week4/annotations/{combined}')
    else:
    # Create the directory if it does not exist and copy the file to the directory
        os.mkdir(f'{combined}')
        shutil.copy(f'/Users/jonat/Library/Mobile Documents/com~apple~CloudDocs/VSCoding/Esade/pythonDS/session_4/annotations/{filename}', f'/Users/jonat/Library/Mobile Documents/com~apple~CloudDocs/VSCoding/Esade/pythonDS/Week4/annotations/{combined}')


In [17]:
# function to extract date from filename
def extractDate(filename):
    splits = filename.split("_")
    date = splits[0]
    time = splits[1]
    # Include the hours in the date to sort the annotations
    return datetime.strptime(date + time, '%Y%m%d%H%M%S')

# Sort the annotations by date and print them
sortedDates = sorted(annotations, key=extractDate)
for date in sortedDates:
    print(date)

20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt
20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt
20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt
20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt
20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt
20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt
20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt
20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_396_3752.txt
20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt
20240102_185605_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_690_3572.txt
20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt
20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_556_4178.txt
20240110_192002_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_380_3728.txt
20240112_192510_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_386_3750.txt
202401

In [18]:
from datetime import datetime
import numpy as np
from collections import Counter

satellites = []
# Loop through the filenames and extract the satellite number
for filename in annotations:
    splits = filename.split("_")
    # Check if the satellite number is in the 2nd or 3rd index
    if splits[2].startswith("SN"):

        satellite = splits[2]
    elif splits[3].startswith("SN"):
        satellite = splits[3]
    else:
        break

    satellites.append(satellite)

# Print the unique satellites and the count of each satellite
satSet = set(satellites)
print(f"Unique satellites: {satSet}")
print(Counter(satellites))

# Find the most recent satellite
mostRecentSplit = sortedDates[0].split("_")
print(f"Most recent satellite {mostRecentSplit[2]}")


Unique satellites: {'SN27', 'SN28', 'SN29', 'SN26', 'SN33', 'SN30', 'SN24', 'SN31'}
Counter({'SN26': 5, 'SN24': 4, 'SN28': 4, 'SN33': 3, 'SN27': 2, 'SN31': 2, 'SN29': 1, 'SN30': 1})
Most recent satellite SN33


In [19]:
regions = []
for filename in annotations:
    splits = filename.split("_")
    # Extract the region from the filename and remove the .txt extension
    last = splits[-1]
    sLast = splits[-2]
    tLast = splits[-3]
    region = tLast + sLast + last
    region = region.removesuffix(".txt")
    regions.append(region)

# Print the unique regions
regionsUnique = set(regions)
print("Uniqe regions:", regionsUnique)
print(f"There are {len(regions)} unique regions")

Uniqe regions: {'SATL-2KM-11N5663734', 'SATL-2KM-11N4183862', 'SATL-2KM-11N4523740', 'SATL-2KM-11N4183724', 'SATL-2KM-11N6243630', 'SATL-2KM-39N5622788', 'SATL-2KM-11N5003600', 'SATL-2KM-11N3463786', 'SATL-2KM-51N7484364', 'SATL-2KM-11N4643828', 'SATL-2KM-11N7243614', 'SATL-2KM-11N3803764', 'SATL-2KM-11N7123566', 'SATL-2KM-11N5763720', 'SATL-2KM-11N4083712', 'SATL-2KM-11N3763724', 'SATL-2KM-11N7023566', 'SATL-2KM-11N7003690', 'SATL-2KM-11N2523954', 'SATL-2KM-11N4143786', 'SATL-2KM-11N7163848', 'SATL-2KM-11N5543610', 'SATL-2KM-11N3803722', 'SATL-2KM-11N5443742', 'SATL-2KM-10N6024148', 'SATL-2KM-11N7183640', 'SATL-2KM-10N5424168', 'SATL-2KM-11N7143632', 'SATL-2KM-10N7223848', 'SATL-2KM-10N5584184', 'SATL-2KM-10N5624170', 'SATL-2KM-51N7484366', 'SATL-2KM-10N6304262', 'SATL-2KM-11N7363716', 'SATL-2KM-10N5504202', 'SATL-2KM-10N5604178', 'SATL-2KM-11N4583756', 'SATL-2KM-10N6344282', 'SATL-2KM-10N7123948', 'SATL-2KM-11N4883638', 'SATL-2KM-11N2624022', 'SATL-2KM-11N5463742', 'SATL-2KM-11N50036