Generators #2

kaburia · 2023-08-19T23:33:08Z

Use generators in the multiple_measurements function to reduce memory usage
Alternatively to find a more optimal way rather than for loops

kaburia · 2023-10-06T06:16:39Z

resulted to multi processing
def multiple_measurements(self, stations_list, csv_file, startDate, endDate, variables, dataset='controlled', aggregate=True):
"""
Retrieves measurements for multiple stations and saves the aggregated data to a CSV file.

    Parameters:
    -----------
    - stations_list (list): A list of strings containing the names of the stations to retrieve data from.
    - csv_file (str): The name of the CSV file to save the data to.
    - startDate (str): The start date for the measurements, in the format 'yyyy-mm-dd'.
    - endDate (str): The end date for the measurements, in the format 'yyyy-mm-dd'.
    - variables (list): A list of strings containing the names of the variables to retrieve.
    - dataset (str): The name of the dataset to retrieve the data from. Default is 'controlled'.

    Returns:
    -----------
    - df (pandas.DataFrame): A DataFrame containing the aggregated data for all stations.

    Raises:

        ValueError: If stations_list is not a list.
    """
    if not isinstance(stations_list, list):
        raise ValueError('Pass in a list')

    error_dict = {}
    pool = mp.Pool(processes=mp.cpu_count())  # Use all available CPU cores

    try:
        results = []
        with tqdm(total=len(stations_list), desc='Retrieving data for stations') as pbar:
            for station in stations_list:
                results.append(pool.apply_async(self.retrieve_data, args=(station, startDate, endDate, variables, dataset, aggregate), callback=lambda _: pbar.update(1)))

            pool.close()
            pool.join()

        df_stats = [result.get() for result in results if isinstance(result.get(), pd.DataFrame)]

        if len(df_stats) > 0:
            df = pd.concat(df_stats, axis=1)
            df.to_csv(f'{csv_file}.csv')
            return df
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        pool.terminate()

kaburia · 2024-07-20T07:43:06Z

The method is well optimized for requesting single variables with a list of stations to get data and might not work as well given multiple variables together with multiple stations list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generators #2

Generators #2

kaburia commented Aug 19, 2023

kaburia commented Oct 6, 2023

kaburia commented Jul 20, 2024

Generators #2

Generators #2

Comments

kaburia commented Aug 19, 2023

kaburia commented Oct 6, 2023

kaburia commented Jul 20, 2024