diff --git a/recipes/mojo_csv/README.md b/recipes/mojo_csv/README.md index ff8c5ae2..4c0b6bce 100644 --- a/recipes/mojo_csv/README.md +++ b/recipes/mojo_csv/README.md @@ -17,75 +17,60 @@ Add the Modular community channel (https://repo.prefix.dev/modular-community) to channels = ["conda-forge", "https://conda.modular.com/max", "https://repo.prefix.dev/modular-community"] ``` -`pixi add mojo_csv` -##### Basic Usage +```sh +pixi add mojo_csv +``` -```mojo -from mojo_csv import CsvReader -from pathlib import Path +## Usage -fn main(): - var csv_path = Path("path/to/csv/file.csv") - var reader = CsvReader(csv_path) - for i in range(len(reader)): - print(reader[i]) -``` -##### Optional Usage +By default uses all logical cores - 2 +```mojo + CsvReader( + in_csv: Path, + delimiter: String = ",", + quotation_mark: String = '"', + num_threads: Int = 0, # default = 0 = use all available cores - 2 + ) +``` ```mojo from mojo_csv import CsvReader from pathlib import Path +from sys import exit -fn main(): +fn main() raises: var csv_path = Path("path/to/csv/file.csv") - var reader = CsvReader(csv_path, delimiter="|", quotation_mark='*') + try: + var reader = CsvReader(csv_path) + except: + exit() for i in range(len(reader)): print(reader[i]) ``` -#### BETA -1.4.0 will be the last version where this isn't the default - -```mojo -ThreadedCsvReader( - file_path: Path, - delimiter: String = ",", - quotation_mark: String = '"', - num_threads: Int = 0 # 0 = use all available cores -) -``` -### Example 1: Default (All Cores) +### Delimiters ```mojo -var reader = ThreadedCsvReader(Path("large_file.csv")) -// Uses all 16 cores on a 16-core system + CsvReader(csv_path, delimiter=";", quotation_mark='|') ``` -### Example 2: Custom Thread Count - +### Threads +__force single threaded__ ```mojo -var reader = ThreadedCsvReader(Path("data.csv"), num_threads=4) -// Uses exactly 4 threads +CsvReader(csv_pash, num_threads = 1) ``` - -### Example 3: Single-threaded - +__use all the threads__ ```mojo -var reader = ThreadedCsvReader(Path("data.csv"), num_threads=1) -// Forces single-threaded execution (same as CsvReader) -``` - -### Example 4: Custom Delimiter +from sys import num_logical_cores -````mojo -var reader = ThreadedCsvReader( - Path("pipe_separated.csv"), - delimiter="|", - num_threads=8 +var reader = CsvReader( + csv_path, num_threads = num_logical_cores() ) +``` + ### Attributes @@ -99,7 +84,7 @@ reader.elements : List[String] # all delimited elements reader.length : Int # total number of elements ```` -##### Indexing +### Indexing currently the array is only 1D, so indexing is fairly manual. @@ -109,72 +94,73 @@ reader[0] # first element ### Performance -- average times over 1k iterations -- 7950x@5.8ghz (peak clock) -- uncompiled +- average times over 100-1k iterations +- AMD 7950x@5.8ghz - single-threaded -micro file benchmark (3 rows) -mini (100 rows) -small (1k rows) -medium file benchmark (100k rows) -large file benchmark (2m rows) +micro file benchmark (3 rows) +mini (100 rows) +small (1k rows) +medium file benchmark (100k rows) +large file benchmark (2m rows) ```log -✨ Pixi task (bench): mojo bench.mojo -running benchmark for micro csv: +✨ Pixi task (bench): mojo bench.mojo running benchmark for micro csv: average time in ms for micro file: -0.01875 +0.0094 ms ------------------------- running benchmark for mini csv: average time in ms for mini file: -0.07328 +0.0657 ms ------------------------- running benchmark for small csv: average time in ms for small file: -0.417368 +0.317 ms ------------------------- running benchmark for medium csv: average time in ms for medium file: -36.45899 +24.62 ms ------------------------- running benchmark for large csv: average time in ms for large file: -1253.19458 +878.6 ms ``` -=== ThreadedCsvReader Performance Comparison === +#### CSV Reader Performance Comparison +``` +Small file benchmark (1,000 rows): +Single-threaded: +Average time: 0.455 ms +Multi-threaded: +Average time: 0.3744 ms +Speedup: 1.22 x + +Medium file benchmark (100,000 rows): +Single-threaded: +Average time: 37.37 ms +Multi-threaded: +Average time: 24.46 ms +Speedup: 1.53 x + +Large file benchmark (2,000,000 rows): +Single-threaded: +Average time: 1210.3 ms +Multi-threaded: +Average time: 863.9 ms +Speedup: 1.4 x -Small file benchmark (1,000 rows): -Single-threaded: -Average time: 0.500384 ms -Multi-threaded: -Average time: 0.451094 ms -Speedup: 1.11 x -------------------------- -Medium file benchmark (100,000 rows): -Single-threaded: -Average time: 38.124275 ms -Multi-threaded: -Average time: 24.650092 ms -Speedup: 1.55 x -------------------------- -Large file benchmark (2,000,000 rows): -Single-threaded: -Average time: 1175.345429 ms -Multi-threaded: -Average time: 830.02685 ms -Speedup: 1.42 x -------------------------- Summary: -Small file speedup: 1.11 x -Medium file speedup: 1.55 x -Large file speedup: 1.42 x +Small file speedup: 1.22 x +Medium file speedup: 1.53 x +Large file speedup: 1.4 x +``` -_Tested on AMD 7950x (16 cores) @ 5.8GHz_ ## Future Improvements +- [ ] 2D indexing +- [ ] CsvWriter +- [ ] CsvDictReader - [ ] SIMD optimization within each thread - [ ] Async Chunking - [ ] Streaming support for very large files diff --git a/recipes/mojo_csv/recipe.yaml b/recipes/mojo_csv/recipe.yaml index 35566d60..c780a657 100644 --- a/recipes/mojo_csv/recipe.yaml +++ b/recipes/mojo_csv/recipe.yaml @@ -1,6 +1,5 @@ context: - version: 1.4.0 - + version: 1.5.0 package: name: "mojo_csv" @@ -8,8 +7,7 @@ package: source: - git: https://github.com/Phelsong/mojo_csv.git - rev: d92e7b72933445c71c463d3f9eb52404dd01edf2 - + rev: b3a9dc4422efbea7a94939e3a48ff4a3b03e3505 build: number: 0 @@ -18,7 +16,7 @@ build: requirements: host: - - max >=25.1.0,<26 + - max >=25.4.0,<26 run: - ${{ pin_compatible('max') }}