# NOMAD Data Retrieval Module

This notebook provides reusable data retrieval UI and logic for NOMAD API access. It can be imported into other dashboard notebooks.

## Usage

```python
# Import the authentication module (required)
%run 'nomad_auth.ipynb'

# Import the data retrieval module
%run 'nomad_data_retrieval.ipynb'

# The following variables and functions are now available:
# - create_data_tab(auth_state): Creates and returns the data retrieval UI and data state
# - get_author_names(client, df): Helper to transform author IDs to names
# - fetch_user_details(client, user_id): Fetch user details from NOMAD API
```

## Requirements

- The `nomad_auth.ipynb` notebook must be run first as this module depends on authentication components
- The `nomad_data.py` module should be available in the workspace

In [None]:
# Import required libraries
import ipywidgets as widgets
from ipywidgets import HBox, VBox, Button, Text, Label
from IPython.display import display, clear_output
import pandas as pd
import plotly.express as px
from datetime import datetime

# Check if nomad_auth has been imported
try:
    # These variables should be defined in nomad_auth.ipynb
    _ = api_client  # Just to check if it exists
    _ = current_token
    _ = current_user_info
except NameError:
    print("⚠️ Warning: nomad_auth.ipynb must be run before this notebook. Run '%run nomad_auth.ipynb' first.")

# Import NOMAD data functionality
try:
    from nomad_data import get_hysprint_data, load_attributions, save_attributions
except ImportError:
    print("⚠️ Warning: nomad_data.py module not found. Data retrieval functionality may be limited.")

## Utility Functions for Data Retrieval

In [None]:
def fetch_user_details(client, user_id):
    """Helper to get user details (Name, Last Name) using the API client.
    
    Args:
        client: NOMAD API client instance
        user_id: User ID to fetch details for
        
    Returns:
        str: User's full name or fallback identification
    """
    try:
        # Use the API client to get user details directly from the users endpoint
        user_data = client.make_request('get', f'users/{user_id}')

        first_name = user_data.get('first_name', '')
        last_name = user_data.get('last_name', '')
        name = f"{first_name} {last_name}".strip()

        if not name: # Fallback if name fields are empty
            name = user_data.get('username', user_data.get('email', f"ID: {user_id}"))
        return name
    except Exception as e:
        # Silent fail with fallback to ID
        return f"{user_id}"

def get_author_names(client, df):
    """Transform main_author IDs to names in the dataframe.
    
    Args:
        client: NOMAD API client instance
        df: DataFrame containing main_author column with user IDs
        
    Returns:
        pd.DataFrame: DataFrame with additional author_name column
    """
    if client is None or df is None or df.empty or 'main_author' not in df.columns:
        return df
    
    # Create a copy to avoid modifying the original dataframe
    result_df = df.copy()
    
    # Get unique author IDs to minimize API calls
    unique_authors = df['main_author'].unique()
    author_map = {}
    
    # Create a mapping from author ID to name
    for author_id in unique_authors:
        if pd.isna(author_id) or author_id is None or author_id == '':
            author_map[author_id] = 'Unknown'
        else:
            author_map[author_id] = fetch_user_details(client, author_id)
    
    # Add the author name column based on the mapping
    result_df['author_name'] = result_df['main_author'].map(author_map)
    
    return result_df

## Data Retrieval Tab Component

In [None]:
def create_data_tab(auth_state):
    """Create the data retrieval tab
    
    Args:
        auth_state: Dictionary with authentication state containing at minimum:
                    - is_authenticated(): Function returning True if authenticated
                    - client(): Function returning API client instance
                      
    Returns:
        tuple: (data_ui, data_state) - UI widget and data state dictionary
    """
    # Fetch button
    fetch_button = widgets.Button(
        description='Fetch HySprint Data',
        disabled=False,
        button_style='info',
        tooltip='Click to fetch all HySprint sample data from NOMAD',
        icon='database'
    )
    
    # Time period selector for visualizations
    time_period = widgets.RadioButtons(
        options=['Monthly', 'Yearly'],
        value='Monthly',
        description='Time Period:',
        disabled=False,
        layout=widgets.Layout(visibility='hidden')
    )
    
    # Status output
    status_output = widgets.Output()
    
    # Visualization output
    viz_output = widgets.Output()
    
    # Data store
    data_state = {
        'df': None,
        'attributions': load_attributions()
    }
    
    # Function to update the time distribution plot
    def update_time_plot(period='Monthly'):
        with viz_output:
            clear_output()
            if data_state['df'] is None or data_state['df'].empty:
                print("No data available to plot.")
                return
                
            # Make sure upload_date is datetime
            df = data_state['df'].copy()
            df['upload_date'] = pd.to_datetime(df['upload_date'])
            
            # Group by month or year based on selection
            if period == 'Monthly':
                df['period'] = df['upload_date'].dt.to_period('M').astype(str)
                title = 'Samples Uploaded per Month'
                x_title = 'Month'
            else:  # Yearly
                df['period'] = df['upload_date'].dt.to_period('Y').astype(str)
                title = 'Samples Uploaded per Year'
                x_title = 'Year'
                
            # Count samples per period
            samples_by_period = df.groupby('period').size().reset_index(name='count')
            
            # Create the bar plot with plotly
            fig = px.bar(samples_by_period, x='period', y='count', 
                        title=title,
                        labels={'period': x_title, 'count': 'Number of Samples'},
                        color_discrete_sequence=['#4CAF50'])
            
            # Improve layout
            fig.update_layout(
                xaxis_tickangle=-45,
                plot_bgcolor='white',
                height=400,
                width=800,
                margin=dict(t=50, b=100)
            )
            
            fig.show()
    
    # Handler for time period change
    def on_time_period_change(change):
        if change['type'] == 'change' and change['name'] == 'value':
            update_time_plot(change['new'])
    
    time_period.observe(on_time_period_change, names='value')
    
    # Fetch button click handler
    def on_fetch_button_click(b):
        with status_output:
            clear_output()
            
            if not auth_state['is_authenticated']():
                print("❌ Please authenticate first")
                return
            
            print("Fetching all available HySprint sample data records from NOMAD...")
            try:
                # Request data with tqdm progress bars (implemented in nomad_data.py)
                df = get_hysprint_data(auth_state['client'](), max_entries=None)
                
                if df is None or df.empty:
                    print("❌ No data retrieved")
                else:
                    # Add author names to the dataframe
                    df_with_names = get_author_names(auth_state['client'](), df)
                    data_state['df'] = df_with_names
                    print(f"✓ Retrieved {len(df)} HySprint samples")
                    
                    # Make time period selector visible
                    time_period.layout.visibility = 'visible'
                    
                    # Update the time distribution plot
                    update_time_plot(time_period.value)
                    

            except Exception as e:
                print(f"❌ Error retrieving data: {str(e)}")
                import traceback
                traceback.print_exc()
    
    fetch_button.on_click(on_fetch_button_click)
    
    # Combine widgets into a form
    data_ui = widgets.VBox([
        widgets.HTML("<h2>HySprint Data Retrieval</h2>"),
        fetch_button,
        status_output,
        widgets.HBox([time_period]),
        viz_output
    ])
    
    return data_ui, data_state

## Example Usage

Here's an example of how to use this data retrieval module in another notebook:

In [None]:
# This cell demonstrates how to use the data retrieval module in another notebook
# Not meant to be executed in this notebook directly

'''
# In your dashboard notebook, run both auth and data retrieval modules
%run ./nomad_auth.ipynb
%run ./nomad_data_retrieval.ipynb

# Create a wrapper for authentication state
auth_state = {
    "is_authenticated": lambda: api_client is not None,
    "client": lambda: api_client,
    "token": lambda: current_token,
    "user_info": lambda: current_user_info
}

# Create the data tab UI and state
data_ui, data_state = create_data_tab(auth_state)

# Display the UI
display(data_ui)
'''

## Direct UI Preview

If you want to see this component directly, you can run the cell below to create a preview using the existing authentication state.

In [None]:
# Uncomment and run this cell to see a preview of the data retrieval UI
# Note: nomad_auth.ipynb must be run first

'''
# Create wrapper for authentication state
auth_state = {
    "is_authenticated": lambda: api_client is not None,
    "client": lambda: api_client,
    "token": lambda: current_token,
    "user_info": lambda: current_user_info
}

# Create and display the data tab
data_ui, _ = create_data_tab(auth_state)
display(data_ui)
'''