### Section 4: Basic Libraries I

<div class="alert alert-block alert-warning">
Given a zip file with a subfolder with multiple annotations, where the name convention for each one of them is: 

{DATE}_{TIME}_SN{SATELLITE_NUMBER}_QUICKVIEW_VISUAL_{VERSION}_{UNIQUE_REGION}.txt

where:

- DATE expressed as YYYYMMDD (year, month and day), e.g. 20241201, 20230321 ...
- TIME expressed as HHMMSS (hour, minutes and seconds), e.g. 2134307
- SATELLITE_NUMBER an integer that represents the satellite number.
- VERSION provides the version of the pipeline, e.g. "0_1_2", "1_3_1" ...
- UNIQUE_REGION provides a unique location in the form of a string, e.g SATL-2KM-10N_552_4164
</div>

In [90]:
import os 
import glob
import pandas as pd

In [91]:
os.chdir('/Users/julia/Desktop/ESADE/python/assignment_4/annotations')

<div class="alert alert-block alert-info">
<b>Exercise 1: Find out how many files the annotations folder has.</b> 

</div>

In [92]:
len(os.listdir())

207

<div class="alert alert-block alert-info">
<b>Exercise 2: Find out how many of the annotations follow the name convention expressed above.</b> 

</div>

In [93]:
#use glob.glob(pattern) too serach for files that match the given pattern
filtered = glob.glob('*_*_SN*_QUICKVIEW_VISUAL_*_*.txt')
len(filtered)

194

<div class="alert alert-block alert-info">
<b>Exercise 3: Find out how many of annotations you have per month and year. Which month has more annotation files.</b> 

</div>

In [94]:
#create year and month list
year = []
month = []

#loop through each filename in the filtered list to extract year and month
for i in filtered:

    #extract the year from the first 4 characters of the filename
    i_year = i[0:4]

    #extract the month from the next 2 characters after the year
    i_month = i[4:6]

    #append the extracted year to the year list
    year.append(i_year)

    #append the extracted month to the month list
    month.append(i_month)

#convert the list of years into a pandas Series and count the occurrences of each year
year = pd.Series(year).value_counts()

#display the count of files per year
year

2024    194
Name: count, dtype: int64

In [95]:
#convert the month list to a DataFrame and count occurrences of each unique month
month = pd.DataFrame(pd.DataFrame(month).value_counts()).reset_index()

#rename columns to 'Month' and 'Count' for clarity
month.columns = ['Month', 'Count']

#display the resulting DataFrame with month counts
month


Unnamed: 0,Month,Count
0,6,52
1,2,45
2,5,28
3,1,27
4,4,25
5,3,17


In [96]:
print(f"The month with the most annotation is {month.loc[month['Count'].idxmax()]["Month"]}.")

The month with the most annotation is 06.


<div class="alert alert-block alert-info">
<b>Exercise 4: Create a new annotations folder with multiple folders corresponding to a month.</b> 

</div>

In [97]:
#create new folder
os.mkdir('/Users/julia/Desktop/ESADE/python/assignment_4/annotations_2')

In [98]:
#change directory to new folder
os.chdir('/Users/julia/Desktop/ESADE/python/assignment_4/annotations_2')

#create folders ccorresponding to months
for month in month["Month"]:
    os.mkdir(month)

<div class="alert alert-block alert-info">
<b>Exercise 5: Print all the annotations from the most recent to the oldest one.</b> 

</div>

In [99]:
file_date = []

for i in filtered:
    #extract date from file name
    i_date = i[0:14]

    #create a list of tuples where each tuple contains the date and file name
    file_date.append((i_date, i))

#sort list 
file_date.sort(reverse=True)

#print each file name in sorted order
for date, file in file_date:
    print(file)


20240623_215120_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_596_4134.txt
20240623_215102_SN43_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_384_3750.txt
20240623_193704_SN27_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_566_3734.txt
20240619_215556_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_742_4460.txt
20240619_185757_SN24_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_528_3700.txt
20240619_052401_SN30_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-52N_368_4336.txt
20240618_215539_SN31_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_458_3756.txt
20240618_215539_SN31_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_452_3740.txt
20240618_193146_SN27_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_530_3682.txt
20240617_211350_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_724_3614.txt
20240617_184443_SN24_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_702_3566.txt
20240617_052859_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-51N_730_4348.txt
20240616_213053_SN30_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_460_3792.txt
20240616_213047_SN30_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_466_3828.txt
20240616_213047_SN30

<div class="alert alert-block alert-info">
<b>Exercise 6: Find out how many different satellites there are, how many annotations we have per satellite number, and which one was used in the most recent annotation file.</b> 

</div>

In [100]:
#create satellite dictionary 
satellite = {}

#go through list of annotation files 
for i in filtered:

    #extract satellite name 
    sn_name = i[16:20]

    #satellite counter conditional statement
    if sn_name not in satellite:
        satellite[sn_name] = 1
    else:
        satellite[sn_name] += 1

print("there are", len(satellite), "unique satellites.")

for i in satellite:
    print(f"satellite {i} has {satellite[i]} annotations.")

print(file_date[1][1][16:20], "was used in the most recent annotation file")


there are 9 unique satellites.
satellite SN27 has 29 annotations.
satellite SN24 has 26 annotations.
satellite SN26 has 37 annotations.
satellite SN33 has 16 annotations.
satellite SN29 has 22 annotations.
satellite SN28 has 16 annotations.
satellite SN31 has 19 annotations.
satellite SN30 has 18 annotations.
satellite SN43 has 11 annotations.
SN43 was used in the most recent annotation file


<div class="alert alert-block alert-info">
<b>Exercise 7: Find out how many unique regions there are.</b> 

</div>

In [101]:
region = []

#loop through each filename in the filtered list to extract the region information
for i in filtered:

    #extract the region from characters at positions -25 to -4 in the filename
    location = i[-25:-4]

    #add the region to the list if it hasn't been added already
    if location not in region:
        region.append(location)
    else:
        pass    #if the region is already in the list, do nothing

#print the total count of unique regions found
print("there are", len(region), "unique regions.")



there are 137 unique regions.
