Llmware is not working in Sub linux system Ubuntu under Windows 11 #115

AlbelTec · 2023-11-21T12:56:11Z

Hi

I tried to do some experiments with Parser() (first into json and then into memory) but in both cases I'm ending up with error : Fault Segmentation.
I presume this has been already reported but seems no fix has been released so far.

kr,

JessBerl · 2023-11-21T13:45:15Z

Hi @AlbelTec - please try the work around described in #48.

AlbelTec · 2023-11-21T14:12:04Z

Hi @JessBerl
Actually I did use : ulimit -s 32768000
but still getting the error :

> Parsing folder: data...
Segmentation fault
(llmware) albel@Thinkpad:~/llmware$

here is my code :

def parsing_pdf():
    # Create a parser
    parser = Parser()

     # Parse entire folder to json
    print (f"\n > Parsing folder: {dataDir}...")

    pdf_parsed_output = Parser().parse_one_pdf("/home/albel/llmware/data/", "Large Language Models.pdf")
    page_number = pdf_parsed_output[0]["master_index"]
    block_text = pdf_parsed_output[0]["text"]
    print(f"\nFirst block found on page {page_number}:\n{block_text}")

    # Parse to json
    #blocks  = parser.ingest_to_json(dataDir)
    # print (f"Total Blocks: {len(parser.parser_output)}")
    # print (f"Files Parsed:")
    # for processed_file in blocks["processed_files"]:
    #     print(f"  - {processed_file}")
    
parsing_pdf()

with json it's more verbose :

albel@Thinkpad:~/llmware$ source /home/albel/llmware/bin/activate
(llmware) albel@Thinkpad:~/llmware$ /home/albel/llmware/bin/python3.10 /home/albel/llmware/llmware_pdf.py

 > Parsing folder: data...
update: pdf_parser - START NEW PDF Processing - file path-/home/albel/llmware_data/tmp/parser_tmp/process_pdf_files/Large Language Models.pdf 
update: pdf_parser - build_obj_master_list - obj created - 3130 
update: pdf_parser - Catalog Dict - <<
/Type /Catalog
/Version /1.4
/Pages 2 0 R
/StructTreeRoot 3 0 R
/MarkInfo 4 0 R
/Lang (en-GB)
/ViewerPreferences 5 0 R
/Metadata 6 0 R
> 
update: pdf_parser - filelen - 5447062 
update: pdf_parser - created additional hidden objstm objects - 0 
update: pdf_parser - page count - 31- pages_found - 31 
update: pdf_parser - global font count- 40 
update: pdf_parser - PAGE PROCESSING-MAIN-LOOP -0-content entries-1 
Segmentation fault

turnham · 2023-11-21T17:32:58Z

Hi @AlbelTec I just tried on WSL2 (Windows 10) and was able to get things working with:

ulimit -s 160000

(The higher 32768000 value seems to be only required when running Linux in a container on Mac)

However, in your case it looks like the ulimit setting might not be taking effect at all. You may be hitting this issue:

microsoft/WSL#633

Can you try the workaround suggested at the bottom of that issue?:

sudo prlimit --stack=unlimited --pid $$; ulimit -s unlimited

AlbelTec · 2023-11-21T19:50:23Z

@turnham Thanks! finally it worked. Actually ulimit was static with 8192 as value. it turned out that with prlimit with root privileges it assigned unlimited as value and the issue is gone. The only drawback, it has to run for every session. I can live with it for now until Windows version to be released.

AlbelTec closed this as completed Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llmware is not working in Sub linux system Ubuntu under Windows 11 #115

Llmware is not working in Sub linux system Ubuntu under Windows 11 #115

AlbelTec commented Nov 21, 2023

JessBerl commented Nov 21, 2023

AlbelTec commented Nov 21, 2023 •

edited

turnham commented Nov 21, 2023

AlbelTec commented Nov 21, 2023 •

edited

Llmware is not working in Sub linux system Ubuntu under Windows 11 #115

Llmware is not working in Sub linux system Ubuntu under Windows 11 #115

Comments

AlbelTec commented Nov 21, 2023

JessBerl commented Nov 21, 2023

AlbelTec commented Nov 21, 2023 • edited

turnham commented Nov 21, 2023

AlbelTec commented Nov 21, 2023 • edited

AlbelTec commented Nov 21, 2023 •

edited

AlbelTec commented Nov 21, 2023 •

edited