<a href="https://colab.research.google.com/github/seungwoosoon/SmartFarmProject/blob/AI/growth_indicator_FruitNumPerPlant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Combine data from JSON files within the "file_attributes" and "growth_indicators" fields located in the "/content/drive/MyDrive/착과군/tom1" through "/content/drive/MyDrive/착과군/tom6" folders into separate Excel files for each "tom" folder, with each variable as a column and each JSON file's data as a row.

## Authenticate google drive

### Subtask:
Mount Google Drive to access files.


**Reasoning**:
Mount Google Drive to access the files stored there.



In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Define input and output paths

### Subtask:
Specify the base directory in Google Drive and the output directory for Excel files.


**Reasoning**:
Define the base and output directories, and create the output directory if it doesn't exist.



In [2]:
# 착과군에 대한 데이터셋 업로드할 폴더 생성

import os

base_dir = '/content/drive/MyDrive/착과군'
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

print(f"Base directory: {base_dir}")
print(f"Output directory: {output_dir}")

Base directory: /content/drive/MyDrive/착과군
Output directory: /content/drive/MyDrive/착과군/output_excel_files


## Iterate through 'tom' folders

### Subtask:
Loop through each folder from 'tom1' to 'tom6'.


**Reasoning**:
Create a list of folder names and iterate through them as instructed.



In [3]:
# 총 6개의 농장별 토마토 생장 데이터 저장 폴더 생성

folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    # The next steps will involve processing the files within this folder
    # For now, just print the folder name to confirm the loop is working

Processing folder: tom1
Processing folder: tom2
Processing folder: tom3
Processing folder: tom4
Processing folder: tom5
Processing folder: tom6


## Process json files in each folder

### Subtask:
For each folder, find all JSON files.


**Reasoning**:
Iterate through the specified folders and find all JSON files within each folder.



In [4]:
# 섞여있는 json파일들을 농장별로 그룹화함

import os

folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    folder_path = os.path.join(base_dir, folder_name)
    entries = os.listdir(folder_path)
    json_files = [entry for entry in entries if entry.endswith('.json')]
    print(f"Found JSON files: {json_files}")

Processing folder: tom1
Found JSON files: ['V001_tom1_41_100_c_08_20211015_16_01104532_38102607.json', 'V001_tom1_50_077_c_11_20211218_25_02111748_38102607.json', 'V001_tom1_44_009_c_08_20211101_19_03153552_49122255.json', 'V001_tom1_00_072_c_14_20220101_27_02101912_38102607.json', 'V001_tom1_44_010_c_09_20211101_19_01154238_49122255.json', 'V001_tom1_45_077_c_11_20211113_20_03153313_38102607.json', 'V001_tom1_42_003_c_06_20211018_17_01090451_49122255.json', 'V001_tom1_47_030_c_14_20211123_22_03153859_49122255.json', 'V001_tom1_39_021_c_07_20210930_14_00103721_49122255.json', 'V001_tom1_47_042_c_09_20211124_22_04110033_40158887.json', 'V001_tom1_47_071_c_12_20211126_22_03124842_38102607.json', 'V001_tom1_41_035_c_08_20211012_16_03102002_49122255.json', 'V001_tom1_46_059_c_12_20211118_21_04095545_40158887.json', 'V001_tom1_40_035_c_08_20211007_15_00141434_40158887.json', 'V001_tom1_44_052_c_09_20211106_19_03092028_40158887.json', 'V001_tom1_00_072_c_14_20220101_27_00101912_38102607.json

## Extract data from json files

### Subtask:
Read each JSON file and extract the required variables from "file_attributes" and "growth_indicators".


**Reasoning**:
Read each JSON file, extract the required data from "file_attributes" and "growth_indicators", and store it in a list of dictionaries.



In [5]:
# json에서 토마토 개체에 대한 정보를 얻을 수 있는 file attribute와
# 각 토마토의 생장에 대한 정보를 얻기 위한 growth indicator의 추츌할 요소들 선정

import json

folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    folder_path = os.path.join(base_dir, folder_name)
    entries = os.listdir(folder_path)
    json_files = [entry for entry in entries if entry.endswith('.json')]

    all_data = []  # List to store data for the current folder

    for json_file in json_files:
        file_path = os.path.join(folder_path, json_file)
        with open(file_path, 'r') as f:
            data = json.load(f)

        extracted_data = {}
        if 'file_attributes' in data:
            extracted_data.update(data['file_attributes'])
        if 'growth_indicators' in data:
            extracted_data.update(data['growth_indicators'])

        all_data.append(extracted_data)

    # At this point, all_data contains the extracted data for the current folder.
    # The next step will be to convert this to a DataFrame and save it to Excel.
    print(f"Extracted data for {len(all_data)} JSON files in {folder_name}")

Processing folder: tom1
Extracted data for 3803 JSON files in tom1
Processing folder: tom2
Extracted data for 2107 JSON files in tom2
Processing folder: tom3
Extracted data for 4758 JSON files in tom3
Processing folder: tom4
Extracted data for 3747 JSON files in tom4
Processing folder: tom5
Extracted data for 2074 JSON files in tom5
Processing folder: tom6
Extracted data for 2511 JSON files in tom6


## Create a dataframe

### Subtask:
Organize the extracted data into a pandas DataFrame.


**Reasoning**:
Convert the extracted data for the current folder into a pandas DataFrame and display the head of the DataFrame.



In [8]:
# 각 json파일에서 file_attribute와 growth indicator에 해당하는 칼럼 값들을 추출하여 excel파일에 정리

import pandas as pd
import os # Import os to use os.path.join

folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    folder_path = os.path.join(base_dir, folder_name)
    entries = os.listdir(folder_path)
    json_files = [entry for entry in entries if entry.endswith('.json')]

    all_data = []  # List to store data for the current folder

    for json_file in json_files:
        file_path = os.path.join(folder_path, json_file)
        with open(file_path, 'r') as f:
            data = json.load(f)

        extracted_data = {}
        if 'file_attributes' in data:
            extracted_data.update(data['file_attributes'])
        if 'growth_indicators' in data:
            extracted_data.update(data['growth_indicators'])

        all_data.append(extracted_data)

    df = pd.DataFrame(all_data)
    print(f"DataFrame for {folder_name}:")
    display(df.head())

    # Move the saving to excel code inside the loop
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    df.to_excel(output_filepath, index=False)
    print(f"Successfully saved {folder_name} data to {output_filepath}")

Processing folder: tom1
DataFrame for tom1:


Unnamed: 0,fileVersion,farmId,week,number,type,flowerCluster,date,numberOfFruitPerPlant,numberOfFruitPerTruss,fruitingNode,numberOfTheFlower
0,V001,tom1,41,100,c,8,20211015,3,8,8,
1,V001,tom1,50,77,c,11,20211218,3,11,11,
2,V001,tom1,44,9,c,8,20211101,3,8,8,
3,V001,tom1,0,72,c,14,20220101,2,14,14,
4,V001,tom1,44,10,c,9,20211101,3,9,9,


Successfully saved tom1 data to /content/drive/MyDrive/착과군/output_excel_files/tom1.xlsx
Processing folder: tom2
DataFrame for tom2:


Unnamed: 0,fileVersion,farmId,week,number,type,flowerCluster,date,numberOfFruitPerPlant,numberOfTheFlower,numberOfFruitPerTruss,fruitingNode
0,V001,tom2,43,65,c,7,20211028,2,1.0,7,7
1,V001,tom2,46,16,c,11,20211117,3,,11,11
2,V001,tom2,50,17,c,11,20211218,3,,11,11
3,V001,tom2,38,59,c,5,20210923,3,,5,5
4,V001,tom2,51,58,c,12,20211224,3,,12,12


Successfully saved tom2 data to /content/drive/MyDrive/착과군/output_excel_files/tom2.xlsx
Processing folder: tom3
DataFrame for tom3:


Unnamed: 0,fileVersion,farmId,week,number,type,flowerCluster,date,numberOfFruitPerPlant,numberOfFruitPerTruss,fruitingNode,numberOfTheFlower
0,V001,tom3,39,98,c,7,20210929,2,7,7,
1,V001,tom3,41,13,c,8,20211011,3,8,8,
2,V001,tom3,39,19,c,8,20211003,0,8,8,3.0
3,V001,tom3,47,52,c,10,20211125,3,10,10,
4,V001,tom3,50,3,c,12,20211213,3,12,12,


Successfully saved tom3 data to /content/drive/MyDrive/착과군/output_excel_files/tom3.xlsx
Processing folder: tom4
DataFrame for tom4:


Unnamed: 0,fileVersion,farmId,week,number,type,flowerCluster,date,numberOfFruitPerPlant,numberOfFruitPerTruss,fruitingNode,numberOfTheFlower
0,V001,tom4,51,64,c,10,20211225,3,10,10,
1,V001,tom4,48,78,c,8,20211204,2,8,8,
2,V001,tom4,42,80,c,6,20211024,3,6,6,
3,V001,tom4,42,56,c,6,20211023,3,6,6,
4,V001,tom4,43,52,c,2,20211030,3,2,2,


Successfully saved tom4 data to /content/drive/MyDrive/착과군/output_excel_files/tom4.xlsx
Processing folder: tom5
DataFrame for tom5:


Unnamed: 0,fileVersion,farmId,week,number,type,flowerCluster,date,numberOfFruitPerPlant,numberOfTheFlower,numberOfFruitPerTruss,fruitingNode
0,V001,tom5,47,78,c,7,20211128,2,1.0,7,7
1,V001,tom5,49,62,c,5,20211210,3,,5,5
2,V001,tom5,47,24,c,3,20211125,3,,3,3
3,V001,tom5,46,15,c,4,20211118,3,,4,4
4,V001,tom5,41,70,c,1,20211017,3,,1,1


Successfully saved tom5 data to /content/drive/MyDrive/착과군/output_excel_files/tom5.xlsx
Processing folder: tom6
DataFrame for tom6:


Unnamed: 0,fileVersion,farmId,week,number,type,flowerCluster,date,numberOfFruitPerPlant,numberOfFruitPerTruss,fruitingNode,numberOfTheFlower
0,V001,tom6,41,6,c,2,20211012,2,2,2,
1,V001,tom6,42,86,c,1,20211019,3,1,1,
2,V001,tom6,45,35,c,4,20211112,3,4,4,
3,V001,tom6,43,86,c,1,20211026,3,1,1,
4,V001,tom6,46,26,c,6,20211118,2,6,6,


Successfully saved tom6 data to /content/drive/MyDrive/착과군/output_excel_files/tom6.xlsx


## Write dataframe to excel

### Subtask:
Save the DataFrame to an Excel file named after the 'tom' folder.


**Reasoning**:
Construct the output file path and save the DataFrame to an Excel file.



In [7]:
# 추출 완료 후 각각 tom농장에 대한 excel 파일 생성

output_filename = f"{folder_name}.xlsx"
output_filepath = os.path.join(output_dir, output_filename)
df.to_excel(output_filepath, index=False)
print(f"Successfully saved {folder_name} data to {output_filepath}")

Successfully saved tom6 data to /content/drive/MyDrive/착과군/output_excel_files/tom6.xlsx


## Summary:

### Data Analysis Key Findings

*   The process successfully extracted data from the `file_attributes` and `growth_indicators` fields of JSON files located within the `tom1` through `tom6` folders.
*   Data from each JSON file was organized into a row in a pandas DataFrame, with the variables from the extracted fields becoming the columns.
*   Separate Excel files were generated for each "tom" folder (`tom1.xlsx` through `tom6.xlsx`), containing the combined data from the JSON files within that folder.

### Insights or Next Steps

*   The structured Excel files are now ready for further analysis or integration into other workflows.
*   Consider adding error handling for potential issues like missing fields in JSON files or file read errors.


# Task
Sort the Excel files located at "/content/drive/MyDrive/착과군/output_excel_files/tom1.xlsx" through "/content/drive/MyDrive/착과군/output_excel_files/tom6.xlsx" by the columns 'number', 'week', and 'fruitingNode' in ascending order.

## Iterate through 'tom' folders

### Subtask:
Loop through each folder name from 'tom1' to 'tom6'.


**Reasoning**:
Iterate through the specified folders.



In [9]:
folder_names = [f'tom{i}' for i in range(1, 7)]

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    # The subsequent steps will involve reading, sorting, and saving the Excel file

Processing folder: tom1
Processing folder: tom2
Processing folder: tom3
Processing folder: tom4
Processing folder: tom5
Processing folder: tom6


**Reasoning**:
Construct the full path to the Excel file for the current folder.



In [10]:
import os

folder_names = [f'tom{i}' for i in range(1, 7)]
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    print(f"Processing file: {output_filepath}")
    # The next steps will involve reading the Excel file

Processing folder: tom1
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom1.xlsx
Processing folder: tom2
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom2.xlsx
Processing folder: tom3
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom3.xlsx
Processing folder: tom4
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom4.xlsx
Processing folder: tom5
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom5.xlsx
Processing folder: tom6
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom6.xlsx


**Reasoning**:
Read the Excel file into a pandas DataFrame.



In [11]:
# excel파일을 읽어서 dataframe 생성
import pandas as pd
import os

folder_names = [f'tom{i}' for i in range(1, 7)]
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    print(f"Processing file: {output_filepath}")
    df = pd.read_excel(output_filepath)
    print(f"Read {len(df)} rows from {output_filename}")
    # The next steps will involve sorting and saving the DataFrame

Processing folder: tom1
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom1.xlsx
Read 3803 rows from tom1.xlsx
Processing folder: tom2
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom2.xlsx
Read 2107 rows from tom2.xlsx
Processing folder: tom3
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom3.xlsx
Read 4758 rows from tom3.xlsx
Processing folder: tom4
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom4.xlsx
Read 3747 rows from tom4.xlsx
Processing folder: tom5
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom5.xlsx
Read 2074 rows from tom5.xlsx
Processing folder: tom6
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom6.xlsx
Read 2511 rows from tom6.xlsx


**Reasoning**:
Sort the DataFrame by the specified columns.



In [12]:
# tom농장별 excel파일을 읽어서 토마토 개체별, 주차별, 착과마디의 우선순위로 정렬
folder_names = [f'tom{i}' for i in range(1, 7)]
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    print(f"Processing file: {output_filepath}")
    df = pd.read_excel(output_filepath)
    print(f"Read {len(df)} rows from {output_filename}")

    df_sorted = df.sort_values(by=['number', 'week', 'fruitingNode'], ascending=True)
    print(f"Sorted DataFrame for {folder_name}")
    # The next steps will involve saving the sorted DataFrame

Processing folder: tom1
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom1.xlsx
Read 3803 rows from tom1.xlsx
Sorted DataFrame for tom1
Processing folder: tom2
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom2.xlsx
Read 2107 rows from tom2.xlsx
Sorted DataFrame for tom2
Processing folder: tom3
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom3.xlsx
Read 4758 rows from tom3.xlsx
Sorted DataFrame for tom3
Processing folder: tom4
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom4.xlsx
Read 3747 rows from tom4.xlsx
Sorted DataFrame for tom4
Processing folder: tom5
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom5.xlsx
Read 2074 rows from tom5.xlsx
Sorted DataFrame for tom5
Processing folder: tom6
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom6.xlsx
Read 2511 rows from tom6.xlsx
Sorted DataFrame for tom6


**Reasoning**:
Save the sorted DataFrame back to the original Excel file, overwriting the unsorted file.



In [13]:
# 새로 정렬된 excel파일로 이전의 데이터 파일 덮어쓰기
folder_names = [f'tom{i}' for i in range(1, 7)]
output_dir = '/content/drive/MyDrive/착과군/output_excel_files'

for folder_name in folder_names:
    print(f"Processing folder: {folder_name}")
    output_filename = f"{folder_name}.xlsx"
    output_filepath = os.path.join(output_dir, output_filename)
    print(f"Processing file: {output_filepath}")
    df = pd.read_excel(output_filepath)
    print(f"Read {len(df)} rows from {output_filename}")

    df_sorted = df.sort_values(by=['number', 'week', 'fruitingNode'], ascending=True)
    print(f"Sorted DataFrame for {folder_name}")

    df_sorted.to_excel(output_filepath, index=False)
    print(f"Successfully saved sorted data to {output_filepath}")

Processing folder: tom1
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom1.xlsx
Read 3803 rows from tom1.xlsx
Sorted DataFrame for tom1
Successfully saved sorted data to /content/drive/MyDrive/착과군/output_excel_files/tom1.xlsx
Processing folder: tom2
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom2.xlsx
Read 2107 rows from tom2.xlsx
Sorted DataFrame for tom2
Successfully saved sorted data to /content/drive/MyDrive/착과군/output_excel_files/tom2.xlsx
Processing folder: tom3
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom3.xlsx
Read 4758 rows from tom3.xlsx
Sorted DataFrame for tom3
Successfully saved sorted data to /content/drive/MyDrive/착과군/output_excel_files/tom3.xlsx
Processing folder: tom4
Processing file: /content/drive/MyDrive/착과군/output_excel_files/tom4.xlsx
Read 3747 rows from tom4.xlsx
Sorted DataFrame for tom4
Successfully saved sorted data to /content/drive/MyDrive/착과군/output_excel_file

## Summary:

### Data Analysis Key Findings

*   The analysis successfully iterated through six Excel files named `tom1.xlsx` through `tom6.xlsx` located in the specified directory.
*   Each Excel file was read into a pandas DataFrame.
*   Each DataFrame was sorted in ascending order based on the columns 'number', 'week', and 'fruitingNode'.
*   The sorted data was successfully saved back to the original Excel files, overwriting the unsorted content.

### Insights or Next Steps

*   The sorting process was completed successfully for all specified files, ensuring the data within each file is ordered according to the required columns.
*   No further steps are immediately required for this specific sorting task, as the data has been successfully processed and saved.
