# Un2Structured Data
This predictor accepts a `text` and corresponding JSON `schema` to generate a structured dictionary. We leverage LLMs to power this.

## Environment Variables
This predictor uses the following predictor variables:

| Name | Notes |
|:--|:--|
| `OPENAI_ORGANIZATION` | Your OpenAI organization ID. |
| `OPENAI_API_KEY` | Your OpenAI API key. |

To create environment variables for the predictor, open a terminal and run the following commands:
```bash
# Run these commands in a terminal:
fxn env create OPENAI_ORGANIZATION <Your OpenAI org id>
fxn env create OPENAI_API_KEY <Your OpenAI api key>
```

> These environment variables will be created as global environment variables, accessible by every predictor you create. You can also create [predictor-specific environment variables](https://docs.fxn.ai/create/secrets).

## Dependencies

We're gonna use [Marvin](https://github.com/PrefectHQ/marvin) from Prefect as they have the most mature tooling for this task.

In [None]:
# Install Marvin
%pip install datamodel-code-generator marvin

For local testing, we'll use a `.env` file to load the environment variables above.

In [None]:
# Install dotenv
%pip install python-dotenv

And load the env:

In [1]:
from dotenv import load_dotenv

load_dotenv()

False

## Implementation

In [80]:
from datamodel_code_generator import generate, InputFileType, PythonVersion
import marvin
from marvin import ai_model
from os import environ
from pathlib import Path
from tempfile import NamedTemporaryFile

# Configure marvin
marvin.settings.openai.organization = environ["OPENAI_ORGANIZATION"]
marvin.settings.openai.api_key = environ["OPENAI_API_KEY"]

def predict (text: str, schema: str) -> str:
    """
    Convert unstructured text into a structured type.
    
    Parameters:
        text: Input text.
        schema: JSON schema. This is stringified.

    Returns:
        str: Output JSON dictionary.
    """
    # Generate data model
    MODEL_NAME = "UserModel"
    schema = schema.replace("x-enumNames", "x-enum-varnames")
    with NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
        model_source_path = Path(f.name)
    generate(
        schema,
        class_name=MODEL_NAME,
        input_file_type=InputFileType.JsonSchema,
        target_python_version=PythonVersion.PY_310,
        use_subclass_enum=True,
        output=model_source_path
    )
    # Load data model
    with open(model_source_path) as f:
        model_source_py = f.read()
    # Load model into interpreter
    locals = {}
    exec(model_source_py, locals, locals)
    Model = locals[MODEL_NAME]
    Model.update_forward_refs(**locals)
    # Parse
    result = ai_model(Model)(text)
    result = result.json()
    # Return
    return result

## Testing

Let's run our `predict` function to ensure that it works well.

In [81]:
schema = """
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": "Command",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "name": {
      "type": ["null", "string"]
    },
    "direction": {
      "$ref": "#/definitions/Direction"
    }
  },
  "definitions": {
    "Direction": {
      "type": "integer",
      "x-enumNames": ["North", "East", "South", "West"],
      "enum": [0, 1, 2, 3]
    }
  }
}
"""

predict("My name is Yusuf and I'm heading East", schema)

'{"name": "Yusuf", "direction": 1}'

## Configuration
Open a terminal and run the following:
```bash
# Create the predictor on Function
fxn create @username/un2structured un2structured.ipynb --overwrite
```