# Task
Combine data from JSON files within the "file_attributes" and "growth_indicators" fields located in the "/content/drive/MyDrive/착과군/tom1" through "/content/drive/MyDrive/착과군/tom6" folders into separate Excel files for each "tom" folder, with each variable as a column and each JSON file's data as a row.

## Authenticate google drive

### Subtask:
Mount Google Drive to access files.


**Reasoning**:
Mount Google Drive to access the files stored there.



In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Define input and output paths

### Subtask:
Specify the base directory in Google Drive and the output directory for Excel files.


**Reasoning**:
Define the base and output directories, and create the output directory if it doesn't exist.



In [None]:
# 착과군에 대한 데이터셋 업로드할 폴더 생성

import os

base_dir = '/content/drive/MyDrive/착과군'
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

print(f"Base directory: {base_dir}")
print(f"Output directory: {output_dir}")

## Iterate through 'tom' folders

### Subtask:
Loop through each folder from 'tom1' to 'tom6'.


**Reasoning**:
Create a list of folder names and iterate through them as instructed.



In [None]:
# 총 6개의 농장별 토마토 생장 데이터 저장 폴더 생성

folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    # The next steps will involve processing the files within this folder
    # For now, just print the folder name to confirm the loop is working

## Process json files in each folder

### Subtask:
For each folder, find all JSON files.


**Reasoning**:
Iterate through the specified folders and find all JSON files within each folder.



In [None]:
# 섞여있는 json파일들을 농장별로 그룹화함

import os

folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    folder_path = os.path.join(base_dir, folder_name)
    entries = os.listdir(folder_path)
    json_files = [entry for entry in entries if entry.endswith('.json')]
    print(f"Found JSON files: {json_files}")

## Extract data from json files

### Subtask:
Read each JSON file and extract the required variables from "file_attributes" and "growth_indicators".


**Reasoning**:
Read each JSON file, extract the required data from "file_attributes" and "growth_indicators", and store it in a list of dictionaries.



In [None]:
# json에서 토마토 개체에 대한 정보를 얻을 수 있는 file attribute와
# 각 토마토의 생장에 대한 정보를 얻기 위한 growth indicator의 추츌할 요소들 선정

import json

folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    folder_path = os.path.join(base_dir, folder_name)
    entries = os.listdir(folder_path)
    json_files = [entry for entry in entries if entry.endswith('.json')]

    all_data = []  # List to store data for the current folder

    for json_file in json_files:
        file_path = os.path.join(folder_path, json_file)
        with open(file_path, 'r') as f:
            data = json.load(f)

        extracted_data = {}
        if 'file_attributes' in data:
            extracted_data.update(data['file_attributes'])
        if 'growth_indicators' in data:
            extracted_data.update(data['growth_indicators'])

        all_data.append(extracted_data)

    # At this point, all_data contains the extracted data for the current folder.
    # The next step will be to convert this to a DataFrame and save it to Excel.
    print(f"Extracted data for {len(all_data)} JSON files in {folder_name}")

## Create a dataframe

### Subtask:
Organize the extracted data into a pandas DataFrame.


**Reasoning**:
Convert the extracted data for the current folder into a pandas DataFrame and display the head of the DataFrame.



In [None]:
# 각 json파일에서 file_attribute와 growth indicator에 해당하는 칼럼 값들을 추출하여 excel파일에 정리

import pandas as pd
import os # Import os to use os.path.join

folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    folder_path = os.path.join(base_dir, folder_name)
    entries = os.listdir(folder_path)
    json_files = [entry for entry in entries if entry.endswith('.json')]

    all_data = []  # List to store data for the current folder

    for json_file in json_files:
        file_path = os.path.join(folder_path, json_file)
        with open(file_path, 'r') as f:
            data = json.load(f)

        extracted_data = {}
        if 'file_attributes' in data:
            extracted_data.update(data['file_attributes'])
        if 'growth_indicators' in data:
            extracted_data.update(data['growth_indicators'])

        all_data.append(extracted_data)

    df = pd.DataFrame(all_data)
    print(f"DataFrame for {folder_name}:")
    display(df.head())

    # Move the saving to excel code inside the loop
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    df.to_excel(output_filepath, index=False)
    print(f"Successfully saved {folder_name} data to {output_filepath}")

## Write dataframe to excel

### Subtask:
Save the DataFrame to an Excel file named after the 'tom' folder.


**Reasoning**:
Construct the output file path and save the DataFrame to an Excel file.



In [7]:
# 추출 완료 후 각각 tom농장에 대한 excel 파일 생성

output_filename = f"{folder_name}.xlsx"
output_filepath = os.path.join(output_dir, output_filename)
df.to_excel(output_filepath, index=False)
print(f"Successfully saved {folder_name} data to {output_filepath}")

Successfully saved tom6 data to /content/drive/MyDrive/착과군/output_excel_files/tom6.xlsx


## Summary:

### Data Analysis Key Findings

*   The process successfully extracted data from the `file_attributes` and `growth_indicators` fields of JSON files located within the `tom1` through `tom6` folders.
*   Data from each JSON file was organized into a row in a pandas DataFrame, with the variables from the extracted fields becoming the columns.
*   Separate Excel files were generated for each "tom" folder (`tom1.xlsx` through `tom6.xlsx`), containing the combined data from the JSON files within that folder.

### Insights or Next Steps

*   The structured Excel files are now ready for further analysis or integration into other workflows.
*   Consider adding error handling for potential issues like missing fields in JSON files or file read errors.


# Task
Sort the Excel files located at "/content/drive/MyDrive/착과군/output_excel_files/tom1.xlsx" through "/content/drive/MyDrive/착과군/output_excel_files/tom6.xlsx" by the columns 'number', 'week', and 'fruitingNode' in ascending order.

## Iterate through 'tom' folders

### Subtask:
Loop through each folder name from 'tom1' to 'tom6'.


**Reasoning**:
Iterate through the specified folders.



In [None]:
folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    # The subsequent steps will involve reading, sorting, and saving the Excel file

**Reasoning**:
Construct the full path to the Excel file for the current folder.



In [None]:
import os

folder_names = [f'tom{i}' for i in range(1, 7)]
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    print(f"Processing file: {output_filepath}")
    # The next steps will involve reading the Excel file

**Reasoning**:
Read the Excel file into a pandas DataFrame.



In [None]:
# excel파일을 읽어서 dataframe 생성
import pandas as pd
import os

folder_names = [f'tom{i}' for i in range(1, 7)]
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    print(f"Processing file: {output_filepath}")
    df = pd.read_excel(output_filepath)
    print(f"Read {len(df)} rows from {output_filename}")
    # The next steps will involve sorting and saving the DataFrame

**Reasoning**:
Sort the DataFrame by the specified columns.



In [None]:
# tom농장별 excel파일을 읽어서 토마토 개체별, 주차별, 착과마디의 우선순위로 정렬
folder_names = [f'tom{i}' for i in range(1, 7)]
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    print(f"Processing file: {output_filepath}")
    df = pd.read_excel(output_filepath)
    print(f"Read {len(df)} rows from {output_filename}")

    df_sorted = df.sort_values(by=['number', 'week', 'fruitingNode'], ascending=True)
    print(f"Sorted DataFrame for {folder_name}")
    # The next steps will involve saving the sorted DataFrame

**Reasoning**:
Save the sorted DataFrame back to the original Excel file, overwriting the unsorted file.



In [None]:
# 새로 정렬된 excel파일로 이전의 데이터 파일 덮어쓰기
folder_names = [f'tom{i}' for i in range(1, 7)]
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    print(f"Processing file: {output_filepath}")
    df = pd.read_excel(output_filepath)
    print(f"Read {len(df)} rows from {output_filename}")

    df_sorted = df.sort_values(by=['number', 'week', 'fruitingNode'], ascending=True)
    print(f"Sorted DataFrame for {folder_name}")

    df_sorted.to_excel(output_filepath, index=False)
    print(f"Successfully saved sorted data to {output_filepath}")

## Summary:

### Data Analysis Key Findings

*   The analysis successfully iterated through six Excel files named `tom1.xlsx` through `tom6.xlsx` located in the specified directory.
*   Each Excel file was read into a pandas DataFrame.
*   Each DataFrame was sorted in ascending order based on the columns 'number', 'week', and 'fruitingNode'.
*   The sorted data was successfully saved back to the original Excel files, overwriting the unsorted content.

### Insights or Next Steps

*   The sorting process was completed successfully for all specified files, ensuring the data within each file is ordered according to the required columns.
*   No further steps are immediately required for this specific sorting task, as the data has been successfully processed and saved.
