# Llama on LOCAL

## Example 1  - LLAMA3 using Ollama

The project is about caracterizing transactions and generating a dashboard. 

- The original author __Thu Vu data analytics__ publications/data can be found here: 

https://www.youtube.com/watch?v=h_GTxRFYETY

https://github.com/thu-vu92/local-llms-analyse-finance

https://ollama.com/


This example includes: 

- LLM Categorization of transactions 

- DASHBOARD on Ploty plot 

### Getting Ollama: 

1. Go to ollama.ai 

2. Click download 

### Installing Ollama

1. Double clink on ollama downloaded file and follow instructions 

### Get models

1. on Terminal: 

    ollama pull llama3

3. Generate own model (Optional), create a file using nano: 

```nano expense_analyzer```

Enter this information in the file and save it: 

        FROM llama3

        PARAMETER temperature 0.8

        SYSTEM You are a financial assistan, you help classify expenses and income from bank trnsactions

3. Run this code

```ollama create expense_analyzer_llama3 -f ./expense_analyzer```

4. To remov the model: 

```ollama rm <model_name>```


In [1]:
from langchain_community.llms import Ollama
import pandas as pd
# Generate the model
llm3 = Ollama(model ='llama3')

# This is testing the connetion to the model works: 
llm3.invoke("Can you add an appropriate category next to each of the following expenses. Respond with a list of categories separated by commas. For example, Spotify AB by Adyen - \
Entertainment, Beta Boulders Ams Amsterdam Nld - Sports, etc.: \
ISS Catering Services De Meern, Vishandel Sier AMSTELVEEN, Ministerie van Justitie en Veiligheid, Etos AMSTERDAM NLD, Bistro Bar Amsterdam")

"Here is the list of expenses with added categories:\n\nISS Catering Services De Meern - Food, Vishandel Sier AMSTELVEEN - Shopping, Ministerie van Justitie en Veiligheid - Government, Etos AMSTERDAM NLD - Health and Wellness, Bistro Bar Amsterdam - Entertainment\n\nLet me know if you'd like me to add any additional categories!"

In [2]:
# Uploading the dataset with the transactions 
df = pd.read_csv('transactions_2022_2023.csv', low_memory = False )
print(df.head())
# Get unique transactions in the Name / Description column
unique_transactions = df["Name / Description"].unique()
print('\nNumber of unique transactions: ', len(unique_transactions))
unique_transactions[1:10]

         Date        Name / Description Expense/Income  Amount (EUR)
0  2023-12-30           Belastingdienst        Expense          9.96
1  2023-12-30               Tesco Breda        Expense         17.53
2  2023-12-30   Monthly Appartment Rent        Expense        451.00
3  2023-12-30  Vishandel Sier Amsterdam        Expense         12.46
4  2023-12-29         Selling Paintings         Income         13.63

Number of unique transactions:  23


array(['Tesco Breda', 'Monthly Appartment Rent',
       'Vishandel Sier Amsterdam', 'Selling Paintings',
       'Spotify Ab By Adyen', 'Tk Maxx Amsterdam Da', 'Consulting',
       'Aidsfonds', 'Tls Bv Inz Ov-Chipkaart'], dtype=object)

In [3]:
# Get index list
# https://stackoverflow.com/questions/47518609/for-loop-range-and-interval-how-to-include-last-step
def hop(start, stop, step):
    for i in range(start, stop, step):
        yield i
    yield stop
# Spit the index into groups of 30
index_list = list(hop(0, len(unique_transactions), 30))
index_list

[0, 23]

In [4]:
# Output validation using pydanthic
from pydantic import BaseModel, field_validator
from typing import List

# Validate response format - check if it actually contains hyphen ("-")
class ResponseChecks(BaseModel):
    data: List[str]

    @field_validator("data")
    def check(cls, value):
        for item in value:
            if len(item) > 0:
                assert "-" in item, "String does not contain hyphen."

# Test validation
ResponseChecks(data=['Hello - World', 'Hello - there!'])

ResponseChecks(data=None)

In [5]:
def categorize_transactions(transaction_names, llm):
    response = llm.invoke("Can you add an appropriate category to the following expenses. For example: Spotify AB by Adyen - Entertainment, Beta Boulders Ams Amsterdam Nld - Sport, etc.. Categories should be less than 4 words. " + transaction_names)
    response = response.split('\n')
    # Keep only the lines in between blank lines (removing the explaination lines at the beginning and end of the response)
    blank_indexes = [index for index in range(len(response)) if response[index] == '']
    if len(blank_indexes) == 1:
        response = response[(blank_indexes[0] + 1):]
    else:
        response = response[(blank_indexes[0] + 1) : blank_indexes[1]]
    # Print response and validate if it is in the correct format
    print(response)
    #### Was getting an error and include this step to prevent it to happen
    response = [s.replace('* ', '') for s in response] 
    # return response
    ResponseChecks(data = response)    
    # Put in dataframe
    categories_df = pd.DataFrame({'Transaction vs category': response})
    return categories_df
    #### Keeping the below inside the function did not work. I move this outside after copying the dataframe into a new one. 
    # categories_df[['Transaction', 'Category']] = categories_df['Transaction vs category'].str.split(' - ', expand=True)
    # return categories_df

In [6]:
# Test out the function
r = categorize_transactions('ISS Catering Services De Meern, Vishandel Sier AMSTELVEEN, Etos AMSTERDAM NLD, Bistro Bar Amsterdam', llm3)
r[['Transaction', 'Category']] = r['Transaction vs category'].str.split(' - ', expand=True)
r

['1. Spotify AB by Adyen - Entertainment', '2. Beta Boulders Ams Amsterdam Nld - Sport', '3. ISS Catering Services De Meern - Food', '4. Vishandel Sier AMSTELVEEN - Shopping', '5. Etos AMSTERDAM NLD - Health', '6. Bistro Bar Amsterdam - Dining']


Unnamed: 0,Transaction vs category,Transaction,Category
0,1. Spotify AB by Adyen - Entertainment,1. Spotify AB by Adyen,Entertainment
1,2. Beta Boulders Ams Amsterdam Nld - Sport,2. Beta Boulders Ams Amsterdam Nld,Sport
2,3. ISS Catering Services De Meern - Food,3. ISS Catering Services De Meern,Food
3,4. Vishandel Sier AMSTELVEEN - Shopping,4. Vishandel Sier AMSTELVEEN,Shopping
4,5. Etos AMSTERDAM NLD - Health,5. Etos AMSTERDAM NLD,Health
5,6. Bistro Bar Amsterdam - Dining,6. Bistro Bar Amsterdam,Dining


In [7]:
# Intialise the categories_df_all dataframe
categories_df_all = pd.DataFrame()
max_tries = 7

# Loop through the index_list
for i in range(0, len(index_list)-1):
    transaction_names = unique_transactions[index_list[i]:index_list[i+1]]
    transaction_names = ','.join(transaction_names)

    # Try and validate output, if it fails, try again for max_tries=7 times
    for j in range(1, max_tries):
        try:
            categories_df = categorize_transactions(transaction_names, llm3)
            categories_df_all = pd.concat([categories_df_all, categories_df], ignore_index=True)
            
        except:
            if j < max_tries:
                continue
            else:
                raise Exception(f"Cannot categorise transactions indexes {i} to {i+1}.")
        break
categories_df.head()

['1. Spotify AB by Adyen - Entertainment', '2. Beta Boulders Ams Amsterdam Nld - Sport', '3. Tls BV Inz Ov-Chipkaart - Transportation', '4. Etos Amsterdam - Health', '5. Audible UK AdblCo/Pymt GBR - Entertainment', '6. Apple Services - Technology', '7. Amazon Lux - Shopping', '8. Classpass* Monthly - Fitness', '9. Birtat Restaurant Amsterdam - Dining', '10. Taxi Utrecht - Transportation', '11. Consulting - Professional Services', '12. Freelancing - Professional Services', '13. Blogging - Professional Services', '14. Selling Paintings - Miscellaneous', '15. Visandel Sier Amsterdam - Shopping', '16. Tk Maxx Amsterdam Da - Shopping', '17. Belastingdienst - Government (assuming this is a tax-related expense)', '18. Tesco Breda - Shopping', '19. Monthly Appartment Rent - Housing', '20. Salary - Income', '21. Aidsfonds - Charity', '22. Tikkie - Miscellaneous']


Unnamed: 0,Transaction vs category
0,1. Spotify AB by Adyen - Entertainment
1,2. Beta Boulders Ams Amsterdam Nld - Sport
2,3. Tls BV Inz Ov-Chipkaart - Transportation
3,4. Etos Amsterdam - Health
4,5. Audible UK AdblCo/Pymt GBR - Entertainment


In [8]:
categories_df_all[['Transaction', 'Category']] = categories_df_all['Transaction vs category'].str.split(' - ', expand=True)
print(categories_df_all.head(5))
# Get unique categories in categories_df_all
unique_categories = categories_df_all["Category"].unique()
unique_categories

                         Transaction vs category  \
0         1. Spotify AB by Adyen - Entertainment   
1     2. Beta Boulders Ams Amsterdam Nld - Sport   
2    3. Tls BV Inz Ov-Chipkaart - Transportation   
3                     4. Etos Amsterdam - Health   
4  5. Audible UK AdblCo/Pymt GBR - Entertainment   

                          Transaction        Category  
0              1. Spotify AB by Adyen   Entertainment  
1  2. Beta Boulders Ams Amsterdam Nld           Sport  
2          3. Tls BV Inz Ov-Chipkaart  Transportation  
3                   4. Etos Amsterdam          Health  
4       5. Audible UK AdblCo/Pymt GBR   Entertainment  


array(['Entertainment', 'Sport', 'Transportation', 'Health', 'Technology',
       'Shopping', 'Fitness', 'Dining', 'Professional Services',
       'Miscellaneous',
       'Government (assuming this is a tax-related expense)', 'Housing',
       'Income', 'Charity'], dtype=object)

In [9]:
### Manual formating of some categories
# Drop NA values
categories_df_all = categories_df_all.dropna()
# If category contains "Food", then categorise as "Food and Drinks"
categories_df_all.loc[categories_df_all['Category'].str.contains("Food"), 'Category'] = "Food and Drinks"
# If category contains "Clothing", then categorise as "Clothing"
categories_df_all.loc[categories_df_all['Category'].str.contains("Clothing"), 'Category'] = "Clothing"
# If category contains "Services", then categorise as "Services"
categories_df_all.loc[categories_df_all['Category'].str.contains("Services"), 'Category'] = "Services"
# If category contains "Health" or "Wellness", then categorise as "Health and Wellness"
categories_df_all.loc[categories_df_all['Category'].str.contains("Health|Wellness"), 'Category'] = "Health and Wellness"
# If category contains "Sport", then categorise as "Sport
#  and Fitness"
categories_df_all.loc[categories_df_all['Category'].str.contains("Sport"), 'Category'] = "Sport and Fitness"
# If category contains "Travel", then categorise as "Travel"
categories_df_all.loc[categories_df_all['Category'].str.contains("Travel"), 'Category'] = "Travel"



# Remove the numbering eg "1. " from Transaction column
# This did not worked
categories_df_all['Transaction'] = categories_df_all['Transaction'].str.replace(r'\d+\.\s+', '')  
categories_df_all


Unnamed: 0,Transaction vs category,Transaction,Category
0,1. Spotify AB by Adyen - Entertainment,1. Spotify AB by Adyen,Entertainment
1,2. Beta Boulders Ams Amsterdam Nld - Sport,2. Beta Boulders Ams Amsterdam Nld,Sport and Fitness
2,3. Tls BV Inz Ov-Chipkaart - Transportation,3. Tls BV Inz Ov-Chipkaart,Transportation
3,4. Etos Amsterdam - Health,4. Etos Amsterdam,Health and Wellness
4,5. Audible UK AdblCo/Pymt GBR - Entertainment,5. Audible UK AdblCo/Pymt GBR,Entertainment
5,6. Apple Services - Technology,6. Apple Services,Technology
6,7. Amazon Lux - Shopping,7. Amazon Lux,Shopping
7,8. ClasspassMonthly - Fitness,8. ClasspassMonthly,Fitness
8,9. Birtat Restaurant Amsterdam - Dining,9. Birtat Restaurant Amsterdam,Dining
9,10. Taxi Utrecht - Transportation,10. Taxi Utrecht,Transportation


In [10]:
### Making a copy of the dataframe
cat_copy = categories_df_all.copy()
# Removing the numbers in the category
cat_copy.loc[:,'Transact'] = cat_copy['Transaction'].str.split('.', expand=True)[1]
cat_copy.loc[:,'Transact'] = cat_copy.Transact.str.strip()
cat_copy.drop(['Transaction', 'Transaction vs category'], axis = 1, inplace = True)
cat_copy.rename(columns = {'Transact': 'Transaction'}, inplace = True)
cat_copy.Transaction.unique()

array(['Spotify AB by Adyen', 'Beta Boulders Ams Amsterdam Nld',
       'Tls BV Inz Ov-Chipkaart', 'Etos Amsterdam',
       'Audible UK AdblCo/Pymt GBR', 'Apple Services', 'Amazon Lux',
       'ClasspassMonthly', 'Birtat Restaurant Amsterdam', 'Taxi Utrecht',
       'Consulting', 'Freelancing', 'Blogging', 'Selling Paintings',
       'Visandel Sier Amsterdam', 'Tk Maxx Amsterdam Da',
       'Belastingdienst', 'Tesco Breda', 'Monthly Appartment Rent',
       'Salary', 'Aidsfonds', 'Tikkie'], dtype=object)

In [11]:
# Merge the categories_df_all with the transactions_2022_2023.csv dataframe (df)
df = pd.read_csv("transactions_2022_2023.csv")
df.loc[df['Name / Description'].str.contains("Spotify"), 'Name / Description'] = "Spotify Ab By Adyen"
df = df.merge(cat_copy, left_on='Name / Description', right_on='Transaction', how='left')
df

Unnamed: 0,Date,Name / Description,Expense/Income,Amount (EUR),Category,Transaction
0,2023-12-30,Belastingdienst,Expense,9.96,Government (assuming this is a tax-related exp...,Belastingdienst
1,2023-12-30,Tesco Breda,Expense,17.53,Shopping,Tesco Breda
2,2023-12-30,Monthly Appartment Rent,Expense,451.0,Housing,Monthly Appartment Rent
3,2023-12-30,Vishandel Sier Amsterdam,Expense,12.46,,
4,2023-12-29,Selling Paintings,Income,13.63,Miscellaneous,Selling Paintings
5,2023-12-29,Spotify Ab By Adyen,Expense,12.19,,
6,2023-12-23,Tk Maxx Amsterdam Da,Expense,27.08,Shopping,Tk Maxx Amsterdam Da
7,2023-12-22,Consulting,Income,541.57,Services,Consulting
8,2023-12-22,Aidsfonds,Expense,10.7,Charity,Aidsfonds
9,2023-12-20,Consulting,Income,2641.93,Services,Consulting


In [12]:
categorizefilenm = "transactions_2022_2023_categorized.csv"
df.to_csv(categorizefilenm, index=False)

In [13]:
### Importing libraries for dashboard. 
import pandas as pd
import numpy as np
import plotly.express as px
import panel as pn

if df.empty:    
    # Read transactions_2022_2023_categorized.csv
    df = pd.read_csv(categorizefilenm = "transactions_2022_2023_categorized.csv", low_memory = False)

# Add year and month columns
df['Year'] = pd.to_datetime(df['Date']).dt.year
df['Month'] = pd.to_datetime(df['Date']).dt.month
df['Month Name'] = pd.to_datetime(df['Date']).dt.strftime("%b")
# Remove "Transaction" and "Transaction vs category" columns
df = df.drop(columns=['Transaction'])
# For Income rows, assign Name / Description to Category
df['Category'] = np.where(df['Expense/Income'] == 'Income', df['Name / Description'], df['Category'])

In [14]:
def make_pie_chart(df, year, label):
    # Filter the dataset for expense transactions
    sub_df = df[(df['Expense/Income'] == label) & (df['Year'] == year)]

    color_scale = px.colors.qualitative.Set2
    
    pie_fig = px.pie(sub_df, values='Amount (EUR)', names='Category', color_discrete_sequence = color_scale)
    pie_fig.update_traces(textposition='inside', direction ='clockwise', hole=0.3, textinfo="label+percent")

    total_expense = df[(df['Expense/Income'] == 'Expense') & (df['Year'] == year)]['Amount (EUR)'].sum() 
    total_income = df[(df['Expense/Income'] == 'Income') & (df['Year'] == year)]['Amount (EUR)'].sum()
    
    if label == 'Expense':
        total_text = "€ " + str(round(total_expense))

        # Saving rate:
        saving_rate = round((total_income - total_expense)/total_income*100)
        saving_rate_text = ": Saving rate " + str(saving_rate) + "%"
    else:
        saving_rate_text = ""
        total_text = "€ " + str(round(total_income))

    pie_fig.update_layout(uniformtext_minsize=10, 
                        uniformtext_mode='hide',
                        title=dict(text=label+" Breakdown " + str(year) + saving_rate_text),
                        # Add annotations in the center of the donut.
                        annotations=[
                            dict(
                                text=total_text, 
                                # Square unit grid starting at bottom left of page
                                x=0.5, y=0.5, font_size=12,
                                # Hide the arrow that points to the [x,y] coordinate
                                showarrow=False
                            )
                        ]
                    )
    return pie_fig
income_pie_fig_2022 = make_pie_chart(df, 2022, 'Income')
income_pie_fig_2022

In [15]:
def make_monthly_bar_chart(df, year, label):
    df = df[(df['Expense/Income'] == label) & (df['Year'] == year)]
    total_by_month = (df.groupby(['Month', 'Month Name'])['Amount (EUR)'].sum()
                        .to_frame()
                        .reset_index()
                        .sort_values(by='Month')  
                        .reset_index(drop=True))
    if label == "Income":
        color_scale = px.colors.sequential.YlGn
    if label == "Expense":
        color_scale = px.colors.sequential.OrRd
    
    bar_fig = px.bar(total_by_month, x='Month Name', y='Amount (EUR)', text_auto='.2s', title=label+" per month", color='Amount (EUR)', color_continuous_scale=color_scale)
    # bar_fig.update_traces(marker_color='lightslategrey')
    
    return bar_fig
income_monthly_2022 = make_monthly_bar_chart(df, 2022, 'Income')
income_monthly_2022

In [16]:
# Pie charts
income_pie_fig_2022 = make_pie_chart(df, 2022, 'Income')
expense_pie_fig_2022 = make_pie_chart(df, 2022, 'Expense')  
income_pie_fig_2023 = make_pie_chart(df, 2023, 'Income')
expense_pie_fig_2023 = make_pie_chart(df, 2023, 'Expense')

# Bar charts
income_monthly_2022 = make_monthly_bar_chart(df, 2022, 'Income')
expense_monthly_2022 = make_monthly_bar_chart(df, 2022, 'Expense')
income_monthly_2023 = make_monthly_bar_chart(df, 2023, 'Income')
expense_monthly_2023 = make_monthly_bar_chart(df, 2023, 'Expense')

# Create tabs
tabs = pn.Tabs(
                        ('2022', pn.Column(pn.Row(income_pie_fig_2022, expense_pie_fig_2022),
                                                pn.Row(income_monthly_2022, expense_monthly_2022))),
                        ('2023', pn.Column(pn.Row(income_pie_fig_2023, expense_pie_fig_2023),
                                                pn.Row(income_monthly_2023, expense_monthly_2023))
                        )
                )
tabs.show()

Launching server at http://localhost:57890


<panel.io.server.Server at 0x13154e240>

In [17]:
# Dashboard template
template = pn.template.FastListTemplate(
    title='Personal Finance Dashboard',
    sidebar=[pn.pane.Markdown("# Income Expense analysis"), 
             pn.pane.Markdown("Overview of income and expense based on my bank transactions. Categories are obtained using local LLMs."),
             pn.pane.PNG("picture.png", sizing_mode="scale_both")
             ],
    main=[pn.Row(pn.Column(pn.Row(tabs)
                           )
                ),
                ],
    # accent_base_color="#88d8b0",
    header_background="#c0b9dd",
)

template.show()

Launching server at http://localhost:57891


<panel.io.server.Server at 0x13164d0d0>



____
## Example 2  - LLAMA3 using LancChain

The project is about caracterizing transactions and generating a dashboard. 

- The original author __LanngChain__ publications/data can be found here: 

https://www.youtube.com/watch?v=-ROS6gfYIts

https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb

https://python.langchain.com/v0.2/docs/integrations/llms/ollama/





In [22]:
### For Tracing. 
### Create an account @ https://smith.langchain.com/
### Generate an API Key in the settings menu


import sys, os
# import google.generativeai as genai
sys.path.append(os.path.abspath(os.path.join('../', 'secret')))
from secret_info import lanchain_trace_api

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = lanchain_trace_api

In [18]:
from langchain_community.llms import Ollama

llm = Ollama(
    model="llama3"
)  # assuming you have Ollama installed and have llama3 model pulled with `ollama pull llama3 `

llm.invoke("Tell me a joke")

"Here's one:\n\nWhy couldn't the bicycle stand up by itself?\n\n(wait for it...)\n\nBecause it was two-tired!\n\nHope that made you laugh! Do you want to hear another one?"