# How To Use Python Workspace

## Working With Data Inputs

The workspace is a managed environment designed for the development of code that will later be used in a Python Transformation or for analytical purposes. 

It allows users to interact with their data using Python in a Jupyter Notebook interface.


There are a few basic concepts users need to understand:

- **Table Input Mapping**: This is used to select tables from the Keboola project Storage to be accessed in the workspace. Once selected, clicking "Load Data" will make these tables available in the `in/tables/` folder within the workspace.
- **File Input Mapping**: This is used to select files from the Keboola project Storage to be accessed in the workspace. Once selected, clicking "Load Data" will make these files available in the `in/files/` folder within the workspace.


### Using Python Libraries:

You can use any Python libraries available on PyPi. If a library is missing from the Keboola Python Workspace image (i.e., you encounter errors when running `import library`), you can install it by running `!pip install library`. To use such a library in a Python Transformation later, you can add the necessary packages to the Packages section of the Python Transformation Configuration.


__*This tutorial will guide you on how to load these input data into a Pandas DataFrame and display the DataFrame for further analysis.*__



In [None]:
import pandas as pd
import os
from keboola.component import CommonInterface

# Initialize CommonInterface
ci = CommonInterface()

# Load input tables
input_tables = ci.get_input_tables_definitions()

# Example: Load the first input table into a DataFrame
if input_tables:
    input_table_path = input_tables[0].full_path
    df = pd.read_csv(input_table_path) # Alternatively, you can provide a direct path to your file such as 'in/tables/my_table.csv'
    
    # Display the DataFrame
    print("Data loaded from:", input_table_path)
    display(df)
else:
    print("No input tables found. Please configure Table Input Mapping in the workspace configuration.")

# Load input files
input_files = ci.get_input_files_definitions()

# Example: List all input files available
if input_files:
    print("Input files available:")
    for file_def in input_files:
        print(f" - {file_def.full_path}")
else:
    print("No input files found. Please configure File Input Mapping in the workspace configuration.")


---
## Producing Data Outputs

Typical Python transformation code __uses some data from input__, __processes it__, and __produces an output__ which is then __loaded back to your Keboola Project Storage__. 

To achieve this, you need to produce objects in the `out/tables/` or `out/files/` folders and then, in Python Transformation, configure the Table Output Mapping or File Output Mapping accordingly.

---

For your transformation to create an object in Keboola Table Storage (the database backend), you need to produce a valid CSV file in the `out/tables/` folder.

To store any other type of file in Keboola File Storage, you need to produce the file in the `out/files/` folder.

---


__*The sample code below demonstrates how to load the input data, process it and create necessary outputs:*__
1. Initializes the `CommonInterface`.
2. Sets up `logging` to provide information about the processing steps.
3. Loads `input tables` and processes them by adding a new column, then saves the processed DataFrame as a CSV file to `out/tables`.
4. Loads `input files` and processes them by copying each file to the `out/files` directory with a new name, simulating some processing step.

In [None]:
import pandas as pd
import os
from keboola.component import CommonInterface
import logging

# Initialize CommonInterface
ci = CommonInterface()

# Set up logging
logging.basicConfig(level=logging.INFO)

# Load input tables
input_tables = ci.get_input_tables_definitions()

# Process input tables and produce output tables
if input_tables:
    for table_def in input_tables:
        # Read the input table
        df = pd.read_csv(table_def.full_path)
        
        # Example processing: Add a new column
        df['processed'] = True
        
        # Define output table path
        output_table_name = f"processed_{table_def.name}"
        output_table_path = output_table_def.full_path
        
        # Save the processed DataFrame to the output path
        df.to_csv(output_table_path, index=False)
        
        # Write manifest for the output table
        logging.info(f"Processed table saved to {output_table_path}")
else:
    logging.info("No input tables found. Please configure Table Input Mapping in the workspace configuration.")

# Load input files
input_files = ci.get_input_files_definitions()

# Process input files and produce output files
if input_files:
    for file_def in input_files:
        # Example processing: Just copy the file to the output directory with a new name
        output_file_path = os.path.join(ci.tables_out_path, f"processed_{os.path.basename(file_def.full_path)}")
        
        # Copy file content to new file
        with open(file_def.full_path, 'rb') as src_file:
            with open(output_file_path, 'wb') as dst_file:
                dst_file.write(src_file.read())
        
        logging.info(f"Processed file saved to {output_file_path}")
else:
    logging.info("No input files found. Please configure File Input Mapping in the workspace configuration.")


## Creating Python Transformation

You can use the Jupyter Notebook as you're used to: develop in multiple cells, execute pieces of code separately, etc. However, to create code that can be used in a Keboola Python Transformation, you'll need to assemble the code later and copy-paste it manually to your Keboola Python Transformation configuration.

### Steps to Follow:

1. **Develop and Test in Jupyter Notebook**:
   - Use multiple cells to develop and test your code.
   - Execute pieces of code separately to ensure correctness.
   

2. **Assemble and Copy-Paste Code**:
   - Once your code is ready, assemble it into a single script.
   - Copy-paste the assembled code into the Keboola Python Transformation configuration.
   

3. **Configure Input and Output Mappings**:
   - Configure the input mappings to specify which data objects should be loaded.
   - Configure the output mappings to specify where the processed data should be stored.

### Execution Process in Keboola:

When you execute the transformation, it runs a Python job that:
1. **Loads Input Objects**: Loads the data specified in the input mappings.
2. **Executes Your Code**: Runs your Python script (and installs additional packages if specified in the Packages section).
3. **Unloads Outputs**: Stores the processed data in the Keboola Storage using the definitions provided in the output mappings.

By following these steps, you can develop and test your code efficiently in Jupyter Notebook and then deploy it seamlessly in Keboola Python Transformation.
