### Exercise


Given a zip file with a subfolder with multiple annotations, where the name convention for each one of them is: 

{DATE}_{TIME}_SN{SATELLITE_NUMBER}_QUICKVIEW_VISUAL_{VERSION}_{UNIQUE_REGION}.txt

where:

- DATE expressed as YYYYMMDD (year, month and day), e.g. 20241201, 20230321 ...
- TIME expressed as HHMMSS (hour, minutes and seconds), e.g. 2134307
- SATELLITE_NUMBER an integer that represents the satellite number.
- VERSION provides the version of the pipeline, e.g. "0_1_2", "1_3_1" ...
- UNIQUE_REGION provides a unique location in the form of a string, e.g SATL-2KM-10N_552_4164

Find out the following thing about your data:

1. How many files the annotations folder has.
2. How many of them follow the name convention expressed above.
3. How many of annotations you have per month and year. Which month has more annotation files.
4. Create a new annotations folder with multiple folders corresponding to a month.
5. Print all the annotations from the most recent to the oldest one. 
6. How many different satellites there are, how many annotations we have per satellite number, and which one was used in the most recent annotation file. 
7. How many unique regions there are.

some tips:
- str class has a method called split, you can use it to get each field per annotation.
- you can use sort from numpy on strings.

Exercise 1

In [22]:
import os

path = 'C:\\Users\\nelso\\OneDrive\\Desktop\\ESADE\\Term_1\\Python_1\\session_4\\annotations'
annotations = os.listdir(path)
total_files = len(annotations)
total_files

206

Exercise 2

In [23]:
valid_files = 0

for file in annotations:
    parts = file.split('_')
    if len(parts) == 11 and parts[0].isdigit() and len(parts[0]) == 8 and \
       parts[1].isdigit() and len(parts[1]) == 6 and \
       parts[2].startswith('SN') and parts[2][2:].isdigit() and \
       parts[3] == 'QUICKVIEW' and parts[4] == 'VISUAL':
        valid_files += 1
valid_files

194

Exercise 3

In [25]:
year_month_dict = {}

for file in annotations:
    if len(file) >= 8 and file[:8].isdigit():
        year_month = file[:6]  # YYYYMM
        if year_month in year_month_dict:
            year_month_dict[year_month] += 1
        else:
            year_month_dict[year_month] = 1

# Print the number of annotations per month/year
for ym, count in year_month_dict.items():
    print(f"{ym}: {count} annotations")

# Find the month with the most annotations
most_common_month = max(year_month_dict, key=year_month_dict.get)
print(f"Most annotations: {most_common_month}, {year_month_dict[most_common_month]} files")

202401: 27 annotations
202402: 45 annotations
202403: 17 annotations
202404: 37 annotations
202405: 28 annotations
202406: 52 annotations
Most annotations: 202406, 52 files


Exersise 4

In [26]:
import shutil
import os

new_folder = r'C:\Users\nelso\OneDrive\Desktop\ESADE\Term_1\Python_1\session_4\monthly_annotations'
os.makedirs(new_folder, exist_ok=True)

for file in annotations:
    if len(file) >= 8 and file[:8].isdigit():
        year_month = file[:6]  # YYYYMM
        month_folder = os.path.join(new_folder, year_month)
        os.makedirs(month_folder, exist_ok=True)
        shutil.copy(os.path.join(path, file), os.path.join(month_folder, file))

Exercise 5

In [27]:
#Need to sort in reverse to get most recent at the top
annotations.sort(reverse=True)
for file in annotations:
    print(file)

20240623_215120_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_596_4134.txt
20240623_215102_SN43_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_384_3750.txt
20240623_193704_SN27_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_566_3734.txt
20240619_215556_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_742_4460.txt
20240619_185757_SN24_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_528_3700.txt
20240619_052401_SN30_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-52N_368_4336.txt
20240618_215539_SN31_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_458_3756.txt
20240618_215539_SN31_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_452_3740.txt
20240618_193146_SN27_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_530_3682.txt
20240617_211350_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_724_3614.txt
20240617_184443_SN24_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_702_3566.txt
20240617_052859_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-51N_730_4348.txt
20240616_213053_SN30_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_460_3792.txt
20240616_213047_SN30_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_466_3828.txt
20240616_213047_SN30

Exercise 6

In [18]:
satellite_dict = {}

for file in annotations:
    parts = file.split('_')
    if len(parts) >= 3 and parts[2].startswith('SN'):
        satellite = parts[2]
        if satellite in satellite_dict:
            satellite_dict[satellite] += 1
        else:
            satellite_dict[satellite] = 1
    #Accouunt for different formatting
    else:
        satellite = parts[4]
        if satellite in satellite_dict:
            satellite_dict[satellite] += 1
        else:
            satellite_dict[satellite] = 1

# Print satellite counts
for sat, count in satellite_dict.items():
    print(f"{sat}: {count} annotations")

# Find the most recent satellite
most_recent_satellite = sorted_annotations[0].split('_')[2]
print(f"Most recent satellite: {most_recent_satellite}")

SN29: 22 annotations
SN43: 11 annotations
SN27: 29 annotations
SN24: 26 annotations
SN30: 18 annotations
SN31: 19 annotations
SN28: 16 annotations
SN33: 16 annotations
SN26: 37 annotations
NS33: 1 annotations
NS28: 1 annotations
NS43: 2 annotations
NS24: 5 annotations
NS29: 2 annotations
NS30: 1 annotations
Most recent satellite: SN29


Exercise 7

In [20]:
unique_regions = set()

for file in annotations:
    parts = file.split('_')
    region = '_'.join(parts[-3:]).replace('.txt', '')
    unique_regions.add(region)

unique_regions_count = len(unique_regions)
print(f"Unique regions: {unique_regions_count}")

Unique regions: 146
