Many "Chat with your Data" tools fail because they directly ask a Large Language Model (LLM) to perform complex calculations. Since LLMs are unreliable with math, they tend to hallucinate when dealing with large datasets. The effective solution is to have the AI generate the necessary code (like Python) to perform the computation accurately, rather than calculating it itself.

This approach builds a robust Personal AI Data Analyst by:

Ingesting flat data files (CSV, Excel, JSON).

Detecting column types (Numeric, Categorical).

Reasoning to translate natural language questions (e.g., "Show me the outliers") into executable Python code.

Executing the code in a safe environment and returning the precise result.

This method effectively merges the AI's language understanding with Python's mathematical precision.

Prerequisites

In [1]:
pip install streamlit pandas matplotlib numpy scipy openpyxl

Collecting streamlit
  Downloading streamlit-1.51.0-py3-none-any.whl.metadata (9.5 kB)
Collecting openpyxl
  Downloading openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting altair!=5.4.0,!=5.4.1,<6,>=4.0 (from streamlit)
  Downloading altair-5.5.0-py3-none-any.whl.metadata (11 kB)
Collecting blinker<2,>=1.5.0 (from streamlit)
  Downloading blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Collecting cachetools<7,>=4.0 (from streamlit)
  Using cached cachetools-6.2.2-py3-none-any.whl.metadata (5.6 kB)
Collecting click<9,>=7.0 (from streamlit)
  Downloading click-8.3.1-py3-none-any.whl.metadata (2.6 kB)
Collecting protobuf<7,>=3.20 (from streamlit)
  Downloading protobuf-6.33.1-cp39-abi3-manylinux2014_x86_64.whl.metadata (593 bytes)
Collecting pyarrow<22,>=7.0 (from streamlit)
  Downloading pyarrow-21.0.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting tenacity<10,>=8.1.0 (from streamlit)
  Downloading tenacity-9.1.2-py3-none-any.whl.metadata (1.2 kB)
Colle

In [1]:
!ollama pull llama3.1

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling 667b0c1932bc: 100% ▕██████████████████▏ 4.9 GB                         [K
pulling 948af2743fc7: 100% ▕██████████████████▏ 1.5 KB                         [K
pulling 0ba8f0e314b4: 100% ▕██████████████████▏  12 KB                         [K
pulling 56bb8bd477a5: 100% ▕██████████████████▏   96 B                         [K
pulling 455f34728c9b: 100% ▕██████████████████▏  487 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


In [4]:
curl -fsSL https://ollama.com/install.sh | sh
ollama serve

SyntaxError: invalid syntax (1837404940.py, line 1)

In [5]:
!echo "Explain the difference between a list and a tuple in Python." | ollama run llama3.1

[?2026h[?25l[1G⠙ [K[?25h[?2026l[?2026h[?25l[1G⠹ [K[?25h[?2026l[?2026h[?25l[1G⠹ [K[?25h[?2026l[?25l[?2026h[?25l[1G[K[?25h[?2026l[2K[1G[?25h**[?25l[?25hLists[?25l[?25h vs[?25l[?25h T[?25l[?25huples[?25l[?25h in[?25l[?25h Python[?25l[?25h**

[?25l[?25hIn[?25l[?25h Python[?25l[?25h,[?25l[?25h `[?25l[?25hlist[?25l[?25h`[?25l[?25h and[?25l[?25h `[?25l[?25htuple[?25l[?25h`[?25l[?25h are[?25l[?25h two[?25l[?25h types[?25l[?25h of[?25l[?25h data[?25l[?25h structures[?25l[?25h that[?25l[?25h can[?25l[?25h be[?25l[?25h [K
used[?25l[?25h to[?25l[?25h store[?25l[?25h collections[?25l[?25h of[?25l[?25h items[?25l[?25h.[?25l[?25h While[?25l[?25h they[?25l[?25h share[?25l[?25h some[?25l[?25h similarities[?25l[?25h,[?25l[?25h the[3D[K
there[?25l[?25h are[?25l[?25h key[?25l[?25h differences[?25l[?25h between[?25l[?25h them[?25l[?25h.

[?25l[?25h**[?25l[?25hList[?25l[?25h**
[?