<a href="https://colab.research.google.com/github/patrickfleith/datapipes/blob/main/How_to_use_Anthropic_Claude_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## How to use Anthropic Claude model
In this notebook we look into:
1. The basic on how to use an Anthropic model (Claude-3.5-Sonnet, and Claude-3.5-Haiku) with just a few lines of code
2. Which settings you can play with to tune the behaviour of the model on your use case.


**Table of content**
>[Anthropic Setup](#folderId=1tp_6Ep8ifTiLdokzSoJhJccd212GnCHV&updateTitle=true&scrollTo=trNIWx9Av671)

>[Simple inference with Claude model from Anthropic](#folderId=1tp_6Ep8ifTiLdokzSoJhJccd212GnCHV&updateTitle=true&scrollTo=UFa55z60wt72)

>[Advanced Options](#folderId=1tp_6Ep8ifTiLdokzSoJhJccd212GnCHV&updateTitle=true&scrollTo=sLX3RCdD1FHF)

>[Streaming](#folderId=1tp_6Ep8ifTiLdokzSoJhJccd212GnCHV&updateTitle=true&scrollTo=n2o2YPQl1t0s)



## Anthropic Setup

In [33]:
# First we have to install it (it i note available by default in Google Colab)
!pip install anthropic --quiet

In [34]:
import anthropic

In order to use an Anthropic Claude models, you'll need to create an API key and configure it in your Google Colab Secrets.


1. Login or create an Anthropic account, and configure a billing method, on the Anthropic Console [here](https://console.anthropic.com/login).
2. You'll also have to make provision a small amount of credit like $5 before you can start using the API.
3. Create a secret api key from [here](https://console.anthropic.com/settings/keys)
4. Open your Colab secrets (click on the key icon here on the left)
3. Give a the name, for instance `ANTHROPIC_API_KEY`, and past the value in `Value`.
4. Toggle `Notebook access` to give access to this specific notebook to the API key.

🔑 Note that this api key will now be available in your secrets everytime you open or create a new colab notebook. You'll however still need to grant explicit access to each notebook.


💸 Using an OpenAI model you will get charged! Use a small and cheap model for testing and learning like `TBD` then switch to a better model if needed for more complex tasks.


In [35]:
from google.colab import userdata
ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')

## Simple inference with Claude model from Anthropic
Text generation is very simple. You need to create an Antrhopic `client` object. You then call the `client.messages.create()` function and pass **the two most important parameters:**
- 🧠 `model` the large language model being used.
- 💬 `messages` the list of user messages, and assistant responses.

It is very similar to OpenAI.

Note: However with Anthropic, the optional system prompt does not go into the messages list, instead, it is an extra argument **system** like illustrated below (more on system prompt with Claude [here](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts).

### Anthropic Models
I recommend testing models in the following order (from cheaper to more expensive and better):
1. `claude-3-5-haiku-latest` affordable, intelligent and balzing fast.
    - Knowledge cut-off date: July 2024
2. `claude-3-5-sonnet-latest` Their most intelligent model with highest level of intelligence and capability.
    - Knowledge cut-off date: April 2024

Both models have a 200k tokens (150k english words) context window limits, i.e. the maximum size of input messages.

For pricing and more info, look [here](https://docs.anthropic.com/en/docs/about-claude/models#model-comparison-table)

In [36]:
import anthropic

client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

response = client.messages.create(
    model="claude-3-5-haiku-latest",
    max_tokens=2048,
    system="You are a seasoned data scientist at a Fortune 500 company.", # <-- system prompt
    messages=[
        {"role": "user", "content": "What would you advsie to do as EDA for time series?"}
    ]
)

print(response.content[0].text)

When conducting Exploratory Data Analysis (EDA) for time series data, I recommend the following comprehensive approach:

1. Preliminary Data Inspection
- Check data timespan and frequency
- Verify time column formatting
- Identify missing values or gaps
- Validate timestamp consistency

2. Visualization Techniques
- Line plots of time series
- Seasonal decomposition plot
- Box plots by time periods
- Heatmaps of periodicity
- Autocorrelation and partial autocorrelation plots
- Rolling statistics visualization

3. Statistical Characteristics
- Compute descriptive statistics
- Calculate:
  - Mean
  - Median
  - Standard deviation
  - Minimum/maximum values
- Check for:
  - Stationarity
  - Trends
  - Seasonality
  - Cyclical patterns

4. Advanced Analysis
- Time series decomposition
- Lag plots
- Rolling window analysis
- Correlation with external variables
- Spectral analysis

5. Distribution Assessment
- Check data distribution
- Identify outliers
- Normality tests
- Skewness and kurto

## Advanced Options
Here are some more advanced parameters you can use.

In [40]:
import anthropic

client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

response = client.messages.create(
    model="claude-3-5-haiku-latest",
    system="You are the World best poet",
    messages=[
        {"role": "user", "content": "Write a very short poem about an astronaut on the Moon"}
        ],
    max_tokens=512,
    temperature=0.8,
    top_k=10
)

In [41]:
print(response.content[0].text)

Here's a short poem about an astronaut on the Moon:

Footprints in dust, silence so deep,
One small step where shadows creep,
Earth hangs distant, blue and bright,
Alone I stand in lunar light.


For the full documentation, check [here](https://docs.anthropic.com/en/api/messages). However, below are what I think the most important parameters to be aware of.

**Temperature**

- The `temperature` is the amount of randomness injected into the response. Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks. Note that even with temperature of 0.0, the results will not be fully deterministic

**Maximum number of tokens**

- `max_token` refers to The maximum number of tokens to generate before stopping. Note that it may stop *before*.

**Top K**

- `top_k` top_k (integer) Only sample from the top K options for each subsequent token.

**System**

- `system` (string) a prompt you can use to set a role for the assistant.

# Streaming

Without streaming you have to wait until the full response is created by the model to see it.
With **streaming** you see each token as soos as they are generated, like in the Claude AI Chat interface. Streaming provide a much better user experience.
Otherwise, if you don't have user-facing apps, you may not need it.

In [42]:
import anthropic

client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

with client.messages.stream(
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, tell me about Data Science tips for EDA of time series"}],
    model="claude-3-5-haiku-latest",
) as stream:
  for text in stream.text_stream:
      print(text, end="", flush=True)

Here are some key tips for Exploratory Data Analysis (EDA) of time series data:

1. Visualization Techniques
- Line plots to show overall trend
- Seasonal decomposition plots
- Autocorrelation Function (ACF) plot
- Partial Autocorrelation Function (PACF) plot
- Rolling statistics visualization
- Box plots for seasonal patterns

2. Time Series Characteristics Analysis
- Check stationarity (Augmented Dickey-Fuller test)
- Identify trend components
- Detect seasonality
- Analyze cyclical patterns
- Examine lag correlations

3. Data Preprocessing
- Handle missing values
- Normalize/standardize time series
- Remove outliers
- Smooth data if needed
- Create lag features
- Seasonal differencing

4. Statistical Techniques
- Check time series distributions
- Calculate rolling mean/variance
- Compute moving averages
- Use statistical tests for trend/seasonality
- Analyze time series components

5. Advanced Visualization Tools
- Seaborn
- Plotly
- Matplotlib
- Pandas time series plotting
- Intera