# 1. Stats

To generate stats, we will read the summary files for different datasets in `summary folder`. The summary folder contains folders named after the dataset name. Each dataset folder contains summary files extacted by respective scripts. The summary files are in json format. We will read the json files and generate stats.

We will generate per dataset stats and general stats combining all the datasets.

Among the stats, we will generate the following:

* [ ] 1. Number of images `all_ds`
* [ ] 2. Number of objects `all_ds`
* [ ] 3. Number of classes `all_ds`
* [ ] 4. Number of instances per class `per_ds` 
* [ ] 5. Average number of instances per image `all_ds` 
<!-- * [ ] 6. Bounding box area distribution `all_ds` -->


The results will be saved in summaries in respective dataset folders.

In [1]:
# imports
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from pathlib import Path
import json
from tqdm.auto import tqdm
import warnings
sns.set()

from utils import stats_tools
warnings.filterwarnings('ignore')

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# set up paths
current_dir = Path(os.getcwd())
summaries_path = current_dir / 'summaries'
summaries_path.mkdir(parents=True, exist_ok=True)

In [3]:
# get dataset to file paths
dataset_to_file_paths = stats_tools.get_dataset_to_file_paths(str(summaries_path))

# len(dataset_to_file_paths), dataset_to_file_paths

In [4]:

global_summary_plain_value_cols_df = pd.DataFrame()

for dataset_name, file_path in dataset_to_file_paths.items():
    
    print("*"*20 + f"{dataset_name}" + "*"*20)
    # load data
    summary = stats_tools.load_summary(file_path=file_path)
    dataset_to_summary = {dataset_name: summary}

    # columns
    all_columns = list(summary.keys())
    # get columns that contain plain values (not lists or dicts)
    plain_value_cols = { k:v for (k,v) in summary.items() if not isinstance(summary[k], (list, dict))}
    plain_value_cols["dataset_name"] = dataset_name

    plain_value_cols_df=pd.DataFrame(plain_value_cols, index=[0])
    # merge with already existing plain_value_cols_dfs
    global_summary_plain_value_cols_df=stats_tools.merge_df(df1=global_summary_plain_value_cols_df, 
                                                            df2=plain_value_cols_df)
    
    #get _per_category_stats and plot them
    # categories = summary['categories']
    per_category_stats = summary['per_category_stats']
    stats_tools.plot_and_save_per_category_stats(per_category_stats=per_category_stats, 
                                                    dataset_name=dataset_name, 
                                                    save_path=summaries_path)
    
    
    # bbox_areas_stats = summary['bbox_areas_stats']

    


# global_summary_plain_value_cols_df save to csv
stats_tools.save_df_to_csv(df=global_summary_plain_value_cols_df,
                            save_path=summaries_path,
                            file_name='global_summary_plain_value_cols.csv')

********************visdrone********************
[INFO] Loading visdrone_converted_to_coco_format_stats.json
[INFO] Loaded visdrone_converted_to_coco_format_stats.json
[INFO] Category stats ....
[INFO] saving plot ....
[INFO] Saved.
********************dota********************
[INFO] Loading dotav2_converted_to_coco_stats.json
[INFO] Loaded dotav2_converted_to_coco_stats.json
[INFO] Category stats ....
[INFO] saving plot ....
[INFO] Saved.
********************kaist_pedestrian********************
[INFO] Loading kaist_converted_to_coco_format_stats.json
[INFO] Loaded kaist_converted_to_coco_format_stats.json
[INFO] Category stats ....
[INFO] saving plot ....
[INFO] Saved.
********************skydata********************
[INFO] Loading train_DET_stats.json
[INFO] Loaded train_DET_stats.json
[INFO] Category stats ....
[INFO] saving plot ....
[INFO] Saved.
********************vhr_10********************
[INFO] Loading vhr_annotations_stats.json
[INFO] Loaded vhr_annotations_stats.json
[INFO] 