Linux support (rebased to latest version) #33

domna · 2023-11-07T13:58:26Z

This adds linux support for the latest version (based on #23) and integrates the ML models as git submodules.

ToDo:

Fix link at the end of transcription process

Setting no window icon on Linux, since TKinter does not seem to support .ico files, nor .png or even .xbm. Excluding Linux resolves the error: _tkinter.TclError: bitmap "noScribeLogo.ico" not defined

kaixxx · 2023-11-07T15:45:56Z

Nice, thank you! We will take a closer look in the coming days. One question: I'm not familiar with git submodules. Can you explain briefly what you did there and what the advantages are?

domna · 2023-11-07T18:20:21Z

Sure, git submodules are essentially a git repository referred inside your repository (since the huggingface models are also git repositories). Here is a short summary from the git documentation. There are advantages but also some disadvantages of using submoduling in a project.

The benefits are that the files in the submodules are integrated as if they where copied directly in these folders (i.e., the exact thing you described in the download.txt files) and they keep the whole commit history of the referred submodule + refer to a certain fixed commit, so when the remote git repo is updated the submodule stays at this commit, ensuring that your software still works, even if something changed remotely.
Disadvantages are that submodules are a bit more overhead to work with, you have to add --recurse-submodules when cloning the repo or invoke a git submodule init ... command to update them (otherwise they are just empty folders). Additionally, since the huggingface repositories use git lfs (for large file support, TL;DR: One should not store large files in a git repo because this pollutes the versioning and blows up the .git folder. git lfs replaces larges file with an external reference effectively taking them partly out of the version control) you also need to add this to your git which complicates the situation further.

So it's your consideration whether you want to use it in this repository. I just introduced them because I think it's a good use case for submodules, but I'm also used to working with it so I understand if would complicate the repo situation too much (in that case I can remove the submoduling from this PR if you like). If you like to keep it I can also update the README how to properly checkout the repo with the submodules.

domna · 2023-11-07T18:23:08Z

This PR has still one shortcoming: The link at the end of the transcription does not work with noScribeEdit yet. I probably will first fall back to simply opening the html page in a browser and see if I can get noScribeEdit to work on Linux. I could also envision integrating noScribeEdit as a submodule if you like the idea of working with submodules (that way it would not be necessary to use the command but it could rather be spawned from the python code directly, which would practically make it platform independent).

kaixxx · 2023-11-08T10:43:48Z

Thank you for the explanation regarding submodules. Your approach makes sense in the current situation. But for the future, there is a good chance that we will deviate from the official repo of faster-whisper and host our own models somewhere else. Even right now, the installer of noScribe already ships with a quantized version of the “small” model that is not in the official repo (you can still use the original model no problem, it’s just a bit bigger). In essence, I would rather not use submodules to stay flexible.

Regarding noScribeEdit: It would be great if you could get this running on Linux too. Having an easy way of controlling and correcting the transcripts is very important and was a crucial consideration for me from the very beginning of developing noScribe. One reason for switching from Word-Macros to my own editor (with v.0.4) was to make this correction function more accessible for people who are not using proprietary software.
Right now, the integration of noScribeEdit is a bit hacky: We compile it as a separate app (using pyinstaller) and copy the resulting binaries in the subfolder “noScribeEdit” in the main app folder (at least this is how I do it on windows).
If you want to start the editor not as a binary but from its python source, you must implement this in the “launch_editor” function somehow. I would still like to keep the editor an independent app that can also be used separate from the main noScribe.

domna · 2023-11-08T11:25:51Z

Thank you for the explanation regarding submodules. Your approach makes sense in the current situation. But for the future, there is a good chance that we will deviate from the official repo of faster-whisper and host our own models somewhere else. Even right now, the installer of noScribe already ships with a quantized version of the “small” model that is not in the official repo (you can still use the original model no problem, it’s just a bit bigger). In essence, I would rather not use submodules to stay flexible.

If this is your only concern I think you can also do fine with the submodule approach. You can just add your own models as a submodule (currently the large and small model also live side to side in their own submodules), replace the old ones or add your own data as a download. Actually, that's what submodules are partially build for: to have a drop in replacement in a particular folder (just change the submodule and you're good to go. At least when the structure matches). As far as I see it you can change between the two models in the GUI, right? So you could also build a json or yaml file relating these entries to the model folders and then noScribe just picks the right folder. This way it would work for any model you would like to add. But that's also somewhat independent of using submodules as they really just checking out a git repo in a folder. So just let me know with what you'd like to go and I'll keep or remove the submodules based on your decision.

Regarding noScribeEdit: It would be great if you could get this running on Linux too. Having an easy way of controlling and correcting the transcripts is very important and was a crucial consideration for me from the very beginning of developing noScribe. One reason for switching from Word-Macros to my own editor (with v.0.4) was to make this correction function more accessible for people who are not using proprietary software. Right now, the integration of noScribeEdit is a bit hacky: We compile it as a separate app (using pyinstaller) and copy the resulting binaries in the subfolder “noScribeEdit” in the main app folder (at least this is how I do it on windows). If you want to start the editor not as a binary but from its python source, you must implement this in the “launch_editor” function somehow. I would still like to keep the editor an independent app that can also be used separate from the main noScribe.

Yes, I just quickly tried to get it to run on linux yesterday and wasn't even aware of the noScribeEdit package. I only found out about it when I tried to click the link. Currently, I just spawn a webbrowser showing the html page if noScibeEdit could not be found. I think that's a good fallback behaviour even if noScibeEdit is not installed for whatever reason.

I think porting noScribeEdit to linux will be straightforward. I only had a quick look, but it's already running. Only the conversion fails because there is no linux ffmpeg binary. So I think I just have to add this and add the appropriate os checks in the code.

Regarding the submoduling or using directly from python code: I think noScribeEdit can be used as is directly from python without changing anything in the repo structure. So it still can be supplied as a standalone package. I think the cleanest way would be plugin entry_points to load noScribeEdit as a plugin in noScribe. Do you have plans to provide noScribe and noScribeEdit as python packages? (There are also gui-scripts which generate a binary/exe file automatically for you).

I would suggest that I give porting noScribeEdit to linux a shot and would keep this PR for review as is (except the submoduling which I will adapt to what you decide). When I get noScribeEdit to work I'll make a separate PR in the noScribeEdit and this repo to re-enable the behaviour. Sounds good?

kaixxx · 2023-11-08T15:36:24Z

Hosting models elsewhere as a submodul would mean hosting them on some kind of git server, which is not trivial given the size of the files.
For noScribeEdit, submoduling could make more sense. But we had a lot of trouble getting the packaging (+ signing and notarization on macOS) with pyinstaller to work. So I would rather not mess with the structure of the project if not absolutely necessary.
Overall, I think using submodules is a bit complicated right now, so I would be happy if you could remove this from your commit if you are ok with that. I will keep the idea in the back of my mind for a potential future refactoring of the code.

Other than that, your suggestion sounds good. Using the webrowser as a fallback for viewing is a good idea. But please add a message box explaining that noScribe comes with it's own editor to review and correct the transcript and how people can install it (or, for the time beeing, that this editor will become available on Linux soon).

kaixxx · 2023-11-08T15:49:50Z

Do you have plans to provide noScribe and noScribeEdit as python packages?

No, not really. Right now, noScribe is mainly targeted at not so tech-savy people (qualitative researchers and journalists in particular). But maybe there are other use cases? May I ask what you are planing to do with noScribe? You seem to come from a quite different field (but also from Frankfurt, funny enough).

domna · 2023-11-09T08:57:46Z

Overall, I think using submodules is a bit complicated right now, so I would be happy if you could remove this from your commit if you are ok with that. I will keep the idea in the back of my mind for a potential future refactoring of the code.

Yes, no problem. I'll revert it to the previous state.

Other than that, your suggestion sounds good. Using the webrowser as a fallback for viewing is a good idea. But please add a message box explaining that noScribe comes with it's own editor to review and correct the transcript and how people can install it (or, for the time beeing, that this editor will become available on Linux soon).

Yes, I can certainly add this.

No, not really. Right now, noScribe is mainly targeted at not so tech-savy people (qualitative researchers and journalists in particular). But maybe there are other use cases? May I ask what you are planing to do with noScribe? You seem to come from a quite different field (but also from Frankfurt, funny enough).

Actually, I'm just helping out a friend who needed to run the software on a linux system. But I think noScribe is a nice project and seems really helpful for leveraging such ML models for a wider audience that I thought that I make it a proper contribution instead of just hacking it together :).
And yes I'm coming from a very different field in natural sciences and programming.

domna · 2023-11-10T18:49:38Z

Hey @kaixxx, I removed the submoduling and added a call for noScribeEdit. There is a pull request for noScribeEdit which adds the necessary changes to let it run under linux.

kaixxx · 2023-11-13T21:22:44Z

This looks nice, thank you. Please give us a few days to review the code and test it on the various platforms.

One note: It seems that you had problems loading the logo icon under linux. There is also a png-version in the repo. You may try using that.

domna · 2023-11-14T17:17:30Z

Thanks for the note. I actually just carried this part over from #23 and @eckhrd's work and did not check it myself. But I will give the png a shot, if it works I'll just use this. Should the png then generally be used also for macOS and Win or do you prefer the ico there?

kaixxx · 2023-11-15T13:00:48Z

Should the png then generally be used

No, I think we'll stick with the ico for Windows and MacOS. "Never change a working icon", you know...

domna · 2023-11-16T15:57:50Z

Works with png 👍️

changing `platform.system() in ["Darwin", "Linux"]` to `("Darwin", "Linux")`

The editor is a vital part of noScribe, so I don't want the launch to fail silently, using the browser as a fallback. It now throws an error if noScribeEdit is not found. Second small change: `program: str | None = None` seems to be python 3.10 syntax. Changed it to `program: str = None`

kaixxx · 2023-12-02T14:07:12Z

@domna Thank you again for the great work. I have merged the PR with minor adjustments.

One last thing for now: Could you write some short installation instructions for Linux? See this section of the Readme: https://github.com/kaixxx/noScribe#download-and-installation

kaixxx · 2023-12-02T15:34:19Z

@domna Forgot something: Do you have binaries for us to share?

menelic · 2023-12-02T20:53:06Z

This is great news, thanks for the work on this - but please create a new release with install instructions, I am not clear if the current main branch can be installed on linux?

domna · 2023-12-03T11:45:21Z

@domna Thank you again for the great work. I have merged the PR with minor adjustments.

One last thing for now: Could you write some short installation instructions for Linux? See this section of the Readme: https://github.com/kaixxx/noScribe#download-and-installation

Actually, I just ran python directly in the repo and didn't create any binary. This should be sufficient for running it:

git clone https://github.com/kaixxx/noScribe.git
cd noScribe
pip install -r environments/requirements_linux.txt
python noScribe.py

It should work for any python version noScribe supports and is ideally done in a new environment. For noScribeEditor, this has of course to be build by itself and put into the correct location.

I'll try to create an installer with pyinstaller and provide it. It will take some time though as I don't have too much time to work on this right now (but until then the above install instructions should help) and I don't have any experience with pyinstaller.

gernophil · 2023-12-03T16:45:38Z

linux and macOS shouldn't be that different from building. I can provide my spec file and shell command (gonna make it an .sh for the next time maybe) to build it (it's a bit manual, but it works). Maybe give it a try. You will need to make some changes. It's been a while since I used Linux, but in general most should work. I know, it's not best practice to use --add-data for adding Python modules, but (as I might have mentioned), it works and I haven't found the time to optimize that yet :).

shell (zsh) command:

mkdir noScribe_build && \ # create new folder 
cd noScribe_build && \ # move to that folder
source /path/to/noScribe_venv/bin/activate && \ # activate noScribe venv
pyinstaller --clean --noconfirm /path/to/noScribe.spec 2> pyinstaller_noScribe.log && \ #run pyinstaller for noScribe
deactivate && \ # deactivate noScribe venv
rm -rf build noScribe.spec dist/noScribe && \ # remove unnecessary files
mv dist noScribe && \ # rename folder 'dist' to 'noScribe'
source /path/to/noScribeEdit_venv/bin/activate && \ # activate noScribeEditor venv
pyinstaller \ # run pyinstaller for noScribeEdit (you might need to change some flags for linux)
--noconfirm \
--onedir \
--windowed \
--icon "/path/to/noScribeEditor/noScribeEditLogo.ico" \
--add-binary "/path/to/noScribeEditor/ffmpeg_linux/ffmpeg:ffmpeg_linux/." \
--add-binary "/path/to/noScribeEditor/ffmpeg_linux/ffplay:ffmpeg_linux/." \
"/path/to/noScribeEditor/noScribeEdit.py" 2> pyinstaller_noScribeEdit.log && \
deactivate && \ # deactivate noScribeEdit venv
mv dist/noScribeEdit.app noScribe/noScribeEdit.app && \ # move binary to same folder as noScribe (it's not an .app for linux)
rm -rf build noScribeEdit.spec dist # remove unnecessary files

.spec file:

# -*- mode: python ; coding: utf-8 -*-
# -*- mode: python ; coding: utf-8 -*-
from PyInstaller.utils.hooks import collect_all

# noScribe

noScribe_a = Analysis(
    ['/path/to/noScribe/noScribe.py'],
    pathex=[],
    binaries=[],
    datas=[('/path/to/noScribe/trans', 'trans/'), ('/path/to/noScribe/graphic_sw.png', '.'), ('/path/to/noScribe/ffmpeg', '.'), ('/path/to/noScribe/models/faster-whisper-small', 'models/faster-whisper-small/'), ('/path/to/noScribe/models/faster-whisper-large-v2', 'models/faster-whisper-large-v2/'), ('/path/to/noScribe/prompt.yml', '.'), ('/path/to/noScribe/LICENSE.txt', '.'), ('/path/to/noScribe/README.md', '.')],
    hiddenimports=[],
    hookspath=[],
    hooksconfig={},
    runtime_hooks=[],
    excludes=[],
    noarchive=False,
)

noScribe_pyz = PYZ(noScribe_a.pure)

noScribe_exe = EXE(
    noScribe_pyz,
    noScribe_a.scripts,
    [],
    exclude_binaries=True,
    name='noScribe',
    debug=False,
    bootloader_ignore_signals=False,
    strip=False,
    upx=True,
    console=False,
    disable_windowed_traceback=False,
    argv_emulation=False,
    target_arch=None,
    codesign_identity=None,
    entitlements_file=None,
    icon=['/path/to/noScribe/noScribeLogo.ico'],
)

# diarize

diarize_datas = [('/path/to/noScribe/models/pyannote_config.yaml', 'models/.'), ('/path/to/noScribe_venv/lib/python3.9/site-packages/lightning', 'lightning/'), ('/path/to/noScribe_venv/lib/python3.9/site-packages/lightning_fabric', 'lightning_fabric'), ('/path/to/noScribe_venv/lib/python3.9/site-packages/torchaudio', 'torchaudio'), ('/path/to/noScribe_venv/lib/python3.9/site-packages/pyannote', 'pyannote/'), ('/path/to/noScribe_venv/lib/python3.9/site-packages/pytorch_metric_learning', 'pytorch_metric_learning/'), ('/path/to/noScribe_venv/lib/python3.9/site-packages/sklearn', 'sklearn/'), ('/path/to/noScribe/models/pytorch_model.bin', 'models/.'), ('/path/to/noScribe_venv/lib/python3.9/site-packages/asteroid_filterbanks', 'asteroid_filterbanks/'), ('/path/to/noScribe_venv/lib/python3.9/site-packages/pytorch_lightning', 'pytorch_lightning/'), ('/path/to/noScribe/models/torch', 'models/torch/')]
diarize_binaries = []
diarize_hiddenimports = []
diarize_tmp_ret = collect_all('speechbrain')
diarize_datas += diarize_tmp_ret[0]; diarize_binaries += diarize_tmp_ret[1]; diarize_hiddenimports += diarize_tmp_ret[2]

diarize_a = Analysis(
    ['/path/to/noScribe/diarize.py'],
    pathex=[],
    binaries=diarize_binaries,
    datas=diarize_datas,
    hiddenimports=diarize_hiddenimports,
    hookspath=[],
    hooksconfig={},
    runtime_hooks=[],
    excludes=[],
    noarchive=False,
)

diarize_pyz = PYZ(diarize_a.pure)

diarize_exe = EXE(
    diarize_pyz,
    diarize_a.scripts,
    [],
    exclude_binaries=True,
    name='diarize',
    debug=False,
    bootloader_ignore_signals=False,
    strip=False,
    upx=True,
    console=True,
    disable_windowed_traceback=False,
    argv_emulation=False,
    target_arch=None,
    codesign_identity=None,
    entitlements_file=None,
)

# final

coll = COLLECT(
    noScribe_exe,
    noScribe_a.binaries,
    noScribe_a.datas,
    diarize_exe,
    diarize_a.binaries,
    diarize_a.datas,
    strip=False,
    upx=True,
    upx_exclude=[],
    name='noScribe',
)

app = BUNDLE(
    coll,
    name='noScribe.app',
    icon='/path/to/noScribe/noScribeLogo.ico',
    bundle_identifier='org.noScribe.noScribe',
)

kaixxx · 2024-05-29T16:35:05Z

@domna: I've got quite a few requests over the last couple of weeks from people interested in running noScribe on Linux. I always point them to your instructions above. But not all of them are comfortable running python etc.

So it would be absolutely wonderful if you could find the time to make a binary...

menelic · 2024-05-30T07:21:10Z

@domna thank you for the Linux script - I d like to respectfully request yeta notehr step owards easy usage on linux: an appimage and/or flatpack or even a snap would also be greatly appreciated. AppImages to me seem the best solution distro-agnostic, package their dependencies and cal easily be updated. Unlike flatpack or snap they don't require anything else to be installed to work (that said I do see the advantages of flatpack)
@kaixxx looks like appimage creation can be automated here on Github, once the recipe is implemented https://appimage-builder.readthedocs.io/en/latest/intro/tutorial.html and https://www.reddit.com/r/AppImage/comments/xmgvmx/just_created_my_first_github_action_to_automate/

domna · 2024-05-30T10:00:09Z

@domna: I've got quite a few requests over the last couple of weeks from people interested in running noScribe on Linux. I always point them to your instructions above. But not all of them are comfortable running python etc.

So it would be absolutely wonderful if you could find the time to make a binary...

Sure. I'm traveling right now but I'll look into it next week.

kaixxx · 2024-06-05T10:18:02Z

@domna: Thank you! We had a few problems with the most recent version on Linux, but they seem to be resolved, see: #60

domna · 2024-06-12T09:26:55Z

@domna thank you for the Linux script - I d like to respectfully request yeta notehr step owards easy usage on linux: an appimage and/or flatpack or even a snap would also be greatly appreciated. AppImages to me seem the best solution distro-agnostic, package their dependencies and cal easily be updated. Unlike flatpack or snap they don't require anything else to be installed to work (that said I do see the advantages of flatpack)

Hey @menelic,

sorry I don't really have time to create such a package as I'm not really experienced with that, so I would need to learn the whole process of it. However, I contributed a build spec for binary files now #63 and once this is distributed somewhere you should be able to download and run noScribe easily. I think it would be easy to use this as a basis for an appimage or flatpak if someone would be interested in creating and maintaining this.

eckhrd and others added 10 commits November 7, 2023 15:16

Initial Linux support

3f17ce8

Tkinter does not seem to support .ico files on Linux

8b8f1ad

Setting no window icon on Linux, since TKinter does not seem to support .ico files, nor .png or even .xbm. Excluding Linux resolves the error: _tkinter.TclError: bitmap "noScribeLogo.ico" not defined

Remove whisper.cpp thread setting form this PR

ecc0b18

Rebases linux support

fd8bc00

Remove obsolete whisper binary

afc9129

Remove faster-whisper-large-v2 folder

e6f797c

Add faster-whisper-large-v2 as submodule

c460277

Remove small-whisper-folder

1edce7e

Adds whisper small as submodule

49c07ad

Adds linux dependencies

5f29c46

pyannote support for linux

1e20a78

domna changed the title ~~Linux support~~ Linux support (rebased to latest version) Nov 7, 2023

domna added 4 commits November 8, 2023 00:07

Fallback to opening webbrowser when noScribeEdit is not available

5c8a285

Fix check for empty filename

aab87c6

Apply larger window to linux, too

8691cce

Generate requirements w/o hashes

e438b98

domna mentioned this pull request Nov 10, 2023

Linux support kaixxx/noScribeEditor#1

Merged

domna added 3 commits November 10, 2023 14:07

Delete submodules

5b43ba5

Re-add download.txt

7a2179f

Adds noScribeEdit call for linux

0509f13

Adds buildspec for linux

cd2d79f

Use png as linux icon

a6e999f

domna and others added 3 commits November 22, 2023 17:57

Merge branch 'main' into linux

3acf90e

Making system checks more consistent (always tuples)

ff6d950

changing `platform.system() in ["Darwin", "Linux"]` to `("Darwin", "Linux")`

kaixxx merged commit 42c4ebc into kaixxx:main Dec 2, 2023

kaixxx mentioned this pull request Dec 2, 2023

Linux support #23

Closed

domna mentioned this pull request Jun 11, 2024

Adds linux build spec #63

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux support (rebased to latest version) #33

Linux support (rebased to latest version) #33

domna commented Nov 7, 2023 •

edited

Loading

kaixxx commented Nov 7, 2023

domna commented Nov 7, 2023 •

edited

Loading

domna commented Nov 7, 2023

kaixxx commented Nov 8, 2023

domna commented Nov 8, 2023 •

edited

Loading

kaixxx commented Nov 8, 2023

kaixxx commented Nov 8, 2023

domna commented Nov 9, 2023

domna commented Nov 10, 2023 •

edited

Loading

kaixxx commented Nov 13, 2023

domna commented Nov 14, 2023 •

edited

Loading

kaixxx commented Nov 15, 2023

domna commented Nov 16, 2023

kaixxx commented Dec 2, 2023

kaixxx commented Dec 2, 2023

menelic commented Dec 2, 2023

domna commented Dec 3, 2023 •

edited

Loading

gernophil commented Dec 3, 2023 •

edited

Loading

kaixxx commented May 29, 2024

menelic commented May 30, 2024

domna commented May 30, 2024

kaixxx commented Jun 5, 2024

domna commented Jun 12, 2024

Linux support (rebased to latest version) #33

Linux support (rebased to latest version) #33

Conversation

domna commented Nov 7, 2023 • edited Loading

kaixxx commented Nov 7, 2023

domna commented Nov 7, 2023 • edited Loading

domna commented Nov 7, 2023

kaixxx commented Nov 8, 2023

domna commented Nov 8, 2023 • edited Loading

kaixxx commented Nov 8, 2023

kaixxx commented Nov 8, 2023

domna commented Nov 9, 2023

domna commented Nov 10, 2023 • edited Loading

kaixxx commented Nov 13, 2023

domna commented Nov 14, 2023 • edited Loading

kaixxx commented Nov 15, 2023

domna commented Nov 16, 2023

kaixxx commented Dec 2, 2023

kaixxx commented Dec 2, 2023

menelic commented Dec 2, 2023

domna commented Dec 3, 2023 • edited Loading

gernophil commented Dec 3, 2023 • edited Loading

kaixxx commented May 29, 2024

menelic commented May 30, 2024

domna commented May 30, 2024

kaixxx commented Jun 5, 2024

domna commented Jun 12, 2024

domna commented Nov 7, 2023 •

edited

Loading

domna commented Nov 7, 2023 •

edited

Loading

domna commented Nov 8, 2023 •

edited

Loading

domna commented Nov 10, 2023 •

edited

Loading

domna commented Nov 14, 2023 •

edited

Loading

domna commented Dec 3, 2023 •

edited

Loading

gernophil commented Dec 3, 2023 •

edited

Loading