# Table of Contents
- [Tools I am using in my everyday work 🧑‍💻](#Tools-I-am-using-in-my-everyday-work-%F0%9F%A7%91%E2%80%8D%F0%9F%92%BB)
- [Tips and Tricks from "Co-Intelligence" book 📖](#Tips-and-Tricks-from-%22Co-Intelligence%22-book-%F0%9F%93%96)
- [Example of using the AI in data processing 🤖](#Example-of-using-the-AI-in-data-processing-%F0%9F%A4%96)
- [Erebus Project ⚛️](#Erebus-Project-%E2%9A%9B%EF%B8%8F)


# Tools I am using in my everyday work 🧑‍💻

1. [Google Gemini](https://gemini.google.com/app) - Pro Subscription
    - very nice if you are using google tools (Calendar, Gmail, Drive)
2. [JetBrains AI](https://www.jetbrains.com/ai/#plans-and-pricing) - Pro Subscription
    - autocompletion when typing
    - Junie as a coding assistant

## Other tools on market
1. [Cursor IDE](https://www.cursor.com/).
2. [Claude AI](https://claude.ai/).
3. [DeepSeek](https://www.deepseek.com/).

# Tips and Tricks from "Co-Intelligence" book 📖

["Co-Intelligence. Living and working with AI" 2024 by Ethan Mollick](https://www.amazon.com/Co-Intelligence-Living-Working-Ethan-Mollick/dp/059371671X)

1. Always invite AI to the table.
    1. Define **Just Me Tasks** and tasks which AI can do instead.
    2. **Automated Tasks** for AI only.
2. Be the human in the loop.
3. Treat AI like a person (but tell it what kind of person it is).
4. Assume this is the worst AI you will ever use.
5. Provide context and constraints.
    1. Measure twice, cut once.
    2. State your goal (summarize, extract, etc).
    3. Break the pattern of what the AI was trained on.


# Example of using the AI in data processing 🤖

In [42]:
import os
import json

import pandas as pd
from dotenv import load_dotenv
from openai import OpenAI


def extract_metadata_from_csv(file_path: str, user_prompt_template: str) -> dict:
	"""
    Reads a CSV file, extracts column headers and sample data, and uses OpenAI
    to extract structured metadata based on a customizable prompt.

    Args:
        file_path (str): The path to the CSV file.
        user_prompt_template (str): A template for the prompt to OpenAI. It should contain
                                    placeholders like `{column_headers}` and `{sample_data}`.

    Returns:
        dict: A dictionary containing the extracted structured metadata, or an empty dict
              if an error occurs.
    """
	try:
		# Load the CSV data
		df = pd.read_csv(csv_file_path)
		print(f"Successfully loaded CSV: {file_path}")

		df = df.fillna('')
		data_str = df.to_string(index=False)

		_text = ""
		for _row in df.to_numpy():
			_text += '\t'.join(str(x) for x in _row) + '\n'

		full_prompt = user_prompt_template.format(
			sample_data=_text
		)

		return
		print("\n--- Sending to AI ---")

		client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"), base_url='https://api.deepseek.com')

		if not client.api_key:
			print("Error: OpenAI API key not found. Please set the OPENAI_API_KEY environment variable.")
			return {}

		chat_completion = client.chat.completions.create(
			model="deepseek-chat",
			messages=[
				{"role": "system",
				 "content": "You are a helpful assistant that extracts structured metadata from data descriptions. Respond only with a JSON object."},
				{"role": "user", "content": full_prompt}
			],
			response_format={"type": "json_object"}
		)

		# Check for API errors
		if chat_completion.choices:
			ai_response_content = chat_completion.choices[0].message.content
			print("\n--- AI Raw Response ---")
			print(ai_response_content)

			try:
				metadata = json.loads(ai_response_content)
				return metadata
			except json.JSONDecodeError as e:
				print(f"Error decoding JSON from AI response: {e}")
				print(f"AI response was: {ai_response_content}")
				return {}
		else:
			print("Error: No choices returned from OpenAI API.")
			return {}

	except FileNotFoundError:
		print(f"Error: CSV file not found at {file_path}")
		return {}
	except pd.errors.EmptyDataError:
		print(f"Error: CSV file at {file_path} is empty.")
		return {}
	except pd.errors.ParserError:
		print(f"Error: Could not parse CSV file at {file_path}. Check file format.")
		return {}
	except Exception as e:
		print(f"An unexpected error occurred: {e}")
		return {}


if __name__ == "__main__":
	load_dotenv()

	csv_file_path = "data/LanthaniteNd__R060993-2__Chemistry__Microprobe_Data_Excel__1641.csv"

	# Customizable prompt template for the AI
	# Instruct the AI to provide a JSON object with specific fields.
	# The AI will interpret the data and suggest appropriate types and descriptions.
	custom_prompt = """
	**Instruction**

	Please, extract the metadata and composition from analytical measurement results.

	**Guidelines:**

	1. Extract compositions from the data.
	2. If there is no units of measurement provided, make an assumption based on the context.
	3. If there is zero value, include it in the output.
	4. If there is no analytical method provided, make your own assumption.
	5. Provide your own quality score of the data based on the completeness and correctness.
	6. Extract ALL measurements and spots available in the data.
	7. Return JSON.

	**Required JSON format**:
    - 'file_description': A brief description of the dataset.
    - 'quality_score': A numerical score between 0 and 1 indicating the quality of the data.
    - 'compositions': An array of objects, where each object describes a column.
      Each composition object should have:
      - 'name': The oxide or element name.
      - 'value': The value of the oxide or element.
      - 'unit': The units for numerical data (e.g., 'wt%', 'ppm', 'ppb'). If not applicable, use null.
    - 'summary_insights': Any high-level observations or potential issues (e.g., missing values, unexpected ranges).

	**Example:**

	**Original data:**
	```
		zinciteR050492
		#1	#2	#3
	Ox	Wt	Percents	Average
	TiO2	0.00	0.00	0.00
	Cr2O3	0.03	0.01	0.00
	MnO	3.21	3.25	3.21
	FeO	0.21	0.14	0.17
	ZnO	95.02	95.21	94.60
	Totals	98.48	98.61	97.98
	```

	**JSON Output:**
	```json
	[
		{{
		  "file_description": "...",
		  "quality_score": 0.6,
		  "compositions": [
			{{
			  "name": "TiO2",
			  "value": "0.00",
			  "unit": "wt%",
			}}
		  ],
		  "summary_insights": "..."
		}}
	]
	```

	**DATA TO EXTRACT:**
	{sample_data}
    """

	extracted_metadata = extract_metadata_from_csv(csv_file_path, custom_prompt)

	if extracted_metadata:
		with open(f'responses/{os.path.basename(csv_file_path)}.json', 'w') as f:
			json.dump(extracted_metadata, f, indent=4)
	else:
		print("\nMetadata extraction failed or returned no data.")


Successfully loaded CSV: data/LanthaniteNd__R060993-2__Chemistry__Microprobe_Data_Excel__1641.csv

	**Instruction**

	Please, extract the metadata and composition from analytical measurement results.

	**Guidelines:**

	1. Extract compositions from the data.
	2. If there is no units of measurement provided, make an assumption based on the context.
	3. If there is zero value, include it in the output.
	4. If there is no analytical method provided, make your own assumption.
	5. Provide your own quality score of the data based on the completeness and correctness.
	6. Extract ALL measurements and spots available in the data.
	7. Return JSON.

	**Required JSON format**:
    - 'file_description': A brief description of the dataset.
    - 'quality_score': A numerical score between 0 and 1 indicating the quality of the data.
    - 'compositions': An array of objects, where each object describes a column.
      Each composition object should have:
      - 'name': The oxide or element name.
    

# Erebus Project ⚛️

Base Prototype link [here](https://keep-rocking-ai.vercel.app/). 

1. Work primarily with EPMA and whole-rock data (a first step).
2. Allow users to store their data and share it with others.
3. Parse all data, extract structured data and provide insights.

# Thanks for joining! 🙇‍♂️