This app receives a laboratory PDF report and extracts special core analysis fields:
- Borehole Name
- Formation
- Lithology
- Sample_Depth (m)
- Permeability to Air (mD)
- Porosity (%)
- Initial Water Saturation (%)
- Irreducible Water Saturation (%)
- Water Recovery (%)
- Residual Oil Saturation (%)
- Oil Recovery (%)
You can run this API online directly from your GitHub repository using GitHub Codespaces:
- Push this repository to GitHub.
- Click Code ➜ Codespaces ➜ Create codespace on main.
- In the terminal inside Codespaces, run:
uvicorn app.main:app --host 0.0.0.0 --port 8000- Open the forwarded port 8000 in browser.
- Use
/docsfor Swagger UI and testPOST /analyze.
The repo includes .devcontainer/devcontainer.json so dependencies are installed automatically.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reloadServer runs at http://127.0.0.1:8000.
docker build -t core-analysis-api .
docker run --rm -p 8000:8000 core-analysis-apiOpen the app root page in browser:
http://127.0.0.1:8000/(local)- or your Codespaces forwarded URL
The page provides:
- PDF upload button
- Analyze action
- Results table with all SCAL fields
- Form-data field:
file(PDF) - Returns extracted records as JSON.
Example:
curl -X POST "http://127.0.0.1:8000/analyze" \
-F "file=@/path/to/core-analysis.pdf"- Parser reads tables from PDF pages using
pdfplumber. - It maps column headers using flexible regex patterns (e.g.,
Swirr (%)for irreducible water saturation). - If your lab template has different header names, extend
COLUMN_PATTERNSinapp/main.py. - CI workflow (
.github/workflows/ci.yml) validates syntax and Docker build on each push/PR.