Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
fb96984
mojo_csv
Phelsong Mar 12, 2025
0cccda6
Update recipe.yaml for CI
Mar 13, 2025
4799442
unquote for ci
Phelsong Mar 19, 2025
532a532
update test
Phelsong Mar 25, 2025
acae1eb
Merge branch 'main' into main
carolinefrasca Apr 14, 2025
a79c0d9
Update source git URL
carolinefrasca Apr 14, 2025
e93d802
change tests syntax
carolinefrasca Apr 14, 2025
0468a4a
Merge branch 'main' into main
carolinefrasca Apr 14, 2025
9b83dea
Merge branch 'main' into main
Phelsong Apr 23, 2025
28c6b0b
Merge branch 'main' into main
Phelsong Apr 29, 2025
181b35b
Merge branch 'main' into main
carolinefrasca Apr 30, 2025
63c9619
Merge branch 'main' into main
carolinefrasca Apr 30, 2025
811f118
verify 25.3.0 and update test
Phelsong Jun 3, 2025
433e85f
Merge branch 'main' into main
Phelsong Jun 3, 2025
87058f1
Merge branch 'main' into main
carolinefrasca Jun 3, 2025
95caa34
update test
Phelsong Jun 5, 2025
564d561
Merge remote-tracking branch 'origin/main'
Phelsong Jun 5, 2025
5a4d112
Merge branch 'main' into main
Phelsong Jun 5, 2025
ed4bd1a
Merge branch 'main' into main
carolinefrasca Jun 5, 2025
e36062c
update csv path
Phelsong Jun 5, 2025
5107692
Merge branch 'main' into main
Phelsong Jun 5, 2025
30ed096
Merge branch 'main' into main
carolinefrasca Jun 9, 2025
770b35c
update test to cwd
Phelsong Jun 10, 2025
0967a1b
Merge remote-tracking branch 'origin/main'
Phelsong Jun 10, 2025
8a55ce0
Merge branch 'main' into main
Phelsong Jun 10, 2025
fb67417
update build and versioning
Phelsong Jun 18, 2025
3203ee0
Merge branch 'main' into main
Phelsong Jun 18, 2025
c02eef6
add logo
Phelsong Jun 19, 2025
378904b
update readme
Phelsong Jun 19, 2025
55ac9cb
Merge branch 'main' into main
carolinefrasca Jun 24, 2025
ccd9da8
update mojo_csv to 1.3
Phelsong Jul 7, 2025
80e98ad
Merge branch 'main' into main
Phelsong Jul 7, 2025
573e99f
1.4.0
Phelsong Jul 21, 2025
14c23d3
update mojo_csv to 1.4.0
Phelsong Jul 21, 2025
f39d27c
mojo_csv_1.5.0/mojo_25.5.0
Phelsong Aug 8, 2025
8b3364c
Merge branch 'main' into main
Phelsong Aug 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 73 additions & 87 deletions recipes/mojo_csv/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,75 +17,60 @@ Add the Modular community channel (https://repo.prefix.dev/modular-community) to
channels = ["conda-forge", "https://conda.modular.com/max", "https://repo.prefix.dev/modular-community"]
```

`pixi add mojo_csv`

##### Basic Usage
```sh
pixi add mojo_csv
```

```mojo
from mojo_csv import CsvReader
from pathlib import Path
## Usage

fn main():
var csv_path = Path("path/to/csv/file.csv")
var reader = CsvReader(csv_path)
for i in range(len(reader)):
print(reader[i])
```

##### Optional Usage
By default uses all logical cores - 2
```mojo
CsvReader(
in_csv: Path,
delimiter: String = ",",
quotation_mark: String = '"',
num_threads: Int = 0, # default = 0 = use all available cores - 2
)
```

```mojo
from mojo_csv import CsvReader
from pathlib import Path
from sys import exit

fn main():
fn main() raises:
var csv_path = Path("path/to/csv/file.csv")
var reader = CsvReader(csv_path, delimiter="|", quotation_mark='*')
try:
var reader = CsvReader(csv_path)
except:
exit()
for i in range(len(reader)):
print(reader[i])
```

#### BETA
1.4.0 will be the last version where this isn't the default

```mojo
ThreadedCsvReader(
file_path: Path,
delimiter: String = ",",
quotation_mark: String = '"',
num_threads: Int = 0 # 0 = use all available cores
)
```

### Example 1: Default (All Cores)
### Delimiters

```mojo
var reader = ThreadedCsvReader(Path("large_file.csv"))
// Uses all 16 cores on a 16-core system
CsvReader(csv_path, delimiter=";", quotation_mark='|')
```

### Example 2: Custom Thread Count

### Threads
__force single threaded__
```mojo
var reader = ThreadedCsvReader(Path("data.csv"), num_threads=4)
// Uses exactly 4 threads
CsvReader(csv_pash, num_threads = 1)
```

### Example 3: Single-threaded

__use all the threads__
```mojo
var reader = ThreadedCsvReader(Path("data.csv"), num_threads=1)
// Forces single-threaded execution (same as CsvReader)
```

### Example 4: Custom Delimiter
from sys import num_logical_cores

````mojo
var reader = ThreadedCsvReader(
Path("pipe_separated.csv"),
delimiter="|",
num_threads=8
var reader = CsvReader(
csv_path, num_threads = num_logical_cores()
)
```


### Attributes

Expand All @@ -99,7 +84,7 @@ reader.elements : List[String] # all delimited elements
reader.length : Int # total number of elements
````

##### Indexing
### Indexing

currently the array is only 1D, so indexing is fairly manual.

Expand All @@ -109,72 +94,73 @@ reader[0] # first element

### Performance

- average times over 1k iterations
- 7950x@5.8ghz (peak clock)
- uncompiled
- average times over 100-1k iterations
- AMD 7950x@5.8ghz
- single-threaded

micro file benchmark (3 rows)
mini (100 rows)
small (1k rows)
medium file benchmark (100k rows)
large file benchmark (2m rows)
micro file benchmark (3 rows)
mini (100 rows)
small (1k rows)
medium file benchmark (100k rows)
large file benchmark (2m rows)

```log
✨ Pixi task (bench): mojo bench.mojo
running benchmark for micro csv:
✨ Pixi task (bench): mojo bench.mojo running benchmark for micro csv:
average time in ms for micro file:
0.01875
0.0094 ms
-------------------------
running benchmark for mini csv:
average time in ms for mini file:
0.07328
0.0657 ms
-------------------------
running benchmark for small csv:
average time in ms for small file:
0.417368
0.317 ms
-------------------------
running benchmark for medium csv:
average time in ms for medium file:
36.45899
24.62 ms
-------------------------
running benchmark for large csv:
average time in ms for large file:
1253.19458
878.6 ms
```

=== ThreadedCsvReader Performance Comparison ===
#### CSV Reader Performance Comparison
```
Small file benchmark (1,000 rows):
Single-threaded:
Average time: 0.455 ms
Multi-threaded:
Average time: 0.3744 ms
Speedup: 1.22 x

Medium file benchmark (100,000 rows):
Single-threaded:
Average time: 37.37 ms
Multi-threaded:
Average time: 24.46 ms
Speedup: 1.53 x

Large file benchmark (2,000,000 rows):
Single-threaded:
Average time: 1210.3 ms
Multi-threaded:
Average time: 863.9 ms
Speedup: 1.4 x

Small file benchmark (1,000 rows):
Single-threaded:
Average time: 0.500384 ms
Multi-threaded:
Average time: 0.451094 ms
Speedup: 1.11 x
-------------------------
Medium file benchmark (100,000 rows):
Single-threaded:
Average time: 38.124275 ms
Multi-threaded:
Average time: 24.650092 ms
Speedup: 1.55 x
-------------------------
Large file benchmark (2,000,000 rows):
Single-threaded:
Average time: 1175.345429 ms
Multi-threaded:
Average time: 830.02685 ms
Speedup: 1.42 x
-------------------------
Summary:
Small file speedup: 1.11 x
Medium file speedup: 1.55 x
Large file speedup: 1.42 x
Small file speedup: 1.22 x
Medium file speedup: 1.53 x
Large file speedup: 1.4 x
```

_Tested on AMD 7950x (16 cores) @ 5.8GHz_

## Future Improvements

- [ ] 2D indexing
- [ ] CsvWriter
- [ ] CsvDictReader
- [ ] SIMD optimization within each thread
- [ ] Async Chunking
- [ ] Streaming support for very large files
Expand Down
8 changes: 3 additions & 5 deletions recipes/mojo_csv/recipe.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
context:
version: 1.4.0

version: 1.5.0

package:
name: "mojo_csv"
version: ${{ version }}

source:
- git: https://github.com/Phelsong/mojo_csv.git
rev: d92e7b72933445c71c463d3f9eb52404dd01edf2

rev: b3a9dc4422efbea7a94939e3a48ff4a3b03e3505

build:
number: 0
Expand All @@ -18,7 +16,7 @@ build:

requirements:
host:
- max >=25.1.0,<26
- max >=25.4.0,<26
run:
- ${{ pin_compatible('max') }}

Expand Down
Loading