In [1]:
from langchain_community.llms import Ollama
llm = Ollama(model="llama2")

In [2]:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert data science."),
    ("user", "{input}")
])

chain = prompt | llm 

In [3]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [4]:
chain = prompt | llm | output_parser

In [5]:
chain.invoke({"input": "define data"})

'\nAs an expert in data science, I can define data as:\n\nData refers to any form of information that is collected, stored, and processed for analysis or insights. It can be in the form of numbers, text, images, audio, or even videos. Data can come from various sources, such as sensors, databases, APIs, or even social media platforms. The primary goal of data science is to extract insights and knowledge from these data sources using various techniques, tools, and algorithms.\n\nData can be structured or unstructured, depending on its form and organization. Structured data is organized in a specific format, such as tables or spreadsheets, while unstructured data does not have a predefined format and requires manual analysis to extract insights. Examples of structured data include customer transaction records, financial reports, and inventory levels, while examples of unstructured data include emails, social media posts, and audio recordings.\n\nData science involves various activities, 

In [6]:
from langchain_community.embeddings import OllamaEmbeddings
import pandas as pd
from langchain_community.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='example.csv',
    csv_args={
    'delimiter': ',',
    'quotechar': '"',
    'fieldnames': ['Sentence', 'Label']
})


data = loader.load()
#data = pd.read_csv("data.csv")
#print(data[10].page_content[:])

embeddings = OllamaEmbeddings()

In [7]:
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(data)
vector = FAISS.from_documents(documents, embeddings)

In [8]:
#geração de regex genericas com agrupamento de queries

from langchain.chains import create_retrieval_chain

from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [9]:
from langchain_core.documents import Document

print(document_chain.invoke({
    "input": "generate a generic regex that matches the most common queries present on this database",
    "context": [Document(page_content="This is a database where the ""Sentence"" column represents SQL queries and ""Label"" column represents if the SQL query is valid")]
}))


Given the context you provided, I can generate a regex pattern that matches the most common SQL queries present in the database. Here's one possible approach:

1. Identify the most common keywords and phrases used in SQL queries:
	* SELECT
	* FROM
	* WHERE
	* JOIN
	* GROUP BY
	* HAVING
	* ORDER BY
	* LIMIT
2. Create a regex pattern that matches any string containing at least one of these keywords or phrases:
```regex
^(?:SELECT|FROM|WHERE|JOIN|GROUP BY|HAVING|ORDER BY|LIMIT)\b
```
Explanation:

* `^` matches the beginning of a line.
* `(?:` is a non-capturing group that allows us to match any of the following keywords without capturing their text.
	+ `SELECT` matches the string "SELECT".
	+ `FROM` matches the string "FROM".
	+ `WHERE` matches the string "WHERE".
	+ `JOIN` matches the string "JOIN".
	+ `GROUP BY` matches the string "GROUP BY".
	+ `HAVING` matches the string "HAVING".
	+ `ORDER BY` matches the string "ORDER BY".
	+ `LIMIT` matches the string "LIMIT".
* `\b` matches a wo

In [10]:
response = retrieval_chain.invoke({"input": "what is the content of this database?"})
print(response["answer"])

Based on the provided context, we can see that there are four sentences in the database:

1. `x' and members.email is NULL` - This sentence does not provide any meaningful information about the contents of the database, as it simply checks whether a specific condition is true or false.
2. `||utl_http.request  ( 'httP://192.168.1.1/')` - This sentence appears to be an HTTP request to the URL `https://192.168.1.1/` .
3. `select * from users where id = 1` - This sentence selects all columns (`*`) from the `users` table where the `id` column is equal to 1.
4. `1; ( load_file ( char (47101116994711297115115119100) ) ) 111;` - This sentence appears to be a SQL statement that loads a file into the database, but the context does not provide enough information to determine what the file contains.

Based on these four sentences, we cannot determine the content of the database with certainty. The first sentence simply checks a condition, while the second and third sentences retrieve data from the

In [11]:
response = retrieval_chain.invoke({"input": "generate generic regexes that matches boolean values and numeric values in the document"})
print(response["answer"])

To generate generic regular expressions that match both Boolean and numeric values in the provided context, we can use a combination of pattern matching and quantifiers. Here are two possible solutions:

1. Using `(?:)` non-capturing group:
```
Question: What are some generic regexes that match boolean and numeric values in the document?

Answer: Sure! Here are two possibilities:

Regex 1: `(?:^|[\s\S])(?:true|false|1|0)` - This regex matches either a capitalized "true" or "false", or a number that is either 1 or 0.

Regex 2: `(?:^|[\s\S])(?:\d+|\D)` - This regex matches either a digit (that is not followed by a whitespace character) or the empty string (""). This covers both numeric and Boolean values.
```
Explanation:

* `(?:^|[\s\S])` uses a non-capturing group to match either the beginning of the string (`^`) or any whitespace characters (`[\s\S]`).
* `(?:true|false|1|0)` uses a non-capturing group to match one of the specified Boolean values or numbers. The `|` character is used t

In [17]:
import { ChatOllama } from "@langchain/community/chat_models/ollama";

const chatModel = new ChatOllama({
  baseUrl: "http://localhost:11434", // Default value
  model: "mistral",
});

SyntaxError: invalid syntax (2021891032.py, line 1)