Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v10 #110

Merged
merged 395 commits into from Aug 31, 2023
Merged

v10 #110

merged 395 commits into from Aug 31, 2023

Conversation

bamdadsabbagh
Copy link
Collaborator

@bamdadsabbagh bamdadsabbagh commented May 31, 2023


id: "20230609-SSE-meeting-v10"
aliases:

  • ".notes/20230609: SSE Meeting v10"
    tags:
  • "sse"
  • "dev"
  • "pro"
  • "sound scape explorer"
    date: 20230609

.notes/20230609: SSE Meeting v10

[toc]

PR v10

#110

Major enhancements

Note

Timeline strategy

Different audio file lengths

Delivery

  • ETA: mid/end June
  • Still working on it by mid July
    • Merging business requirements to avoid undoing v10 new features
    • Implementing new business requirements
    • Improve codebase

TODOs left over for v11

  • Processing: Investigate migrating to python 3.10
    • We observe 6x better performance on Nicolas' M1 MacBook with 3.10 compared
      to 3.8
  • Processing/Config: Path importer for configuration file
    • Use Lana's sse-config-importer
  • Processing/Config: Investigate the use of a default range
  • Front/Histogram: Combine meta properties
  • Front/FIX: Unable to open h5 file created with docker with chromium
    • Reproduce with the help of Rémi
    • Could not reproduce.
  • Front/FIX: Reproduce the firefox error when loading h5wasm in web worker
    • Import at top module error
    • Could not reproduce.
  • Front/FIX: Console shows render and filter in an infinite loop
    • Could not reproduce.
  • Front/Queries: Migrate to plotly
    • Are queries still used? No users seem to use it. I guess we can delay it
      for later on.
  • Front/Queries: Adapt to dynamic filters and color scales
  • Front/Histogram: Add new traces to current plot
    • Need combination
  • Front/Metas: Combine multiple meta properties
    • Needs all derived (volumes, matrices, pairings) being done in the Front

Notes from v9 PR #97

  • Processing/Autocluster: Add frozen thresholds, then remove setting from
    configuration file
    • Should we store the best consensus of all the given thresholds?
    • Or store all autoclusterings by their thresholds
    • Autocluster algorithm has been changed.
  • Processing/Trajectories: Add trajectories
  • Front/Time: Timestamp with timezone seems unstable
  • Front/Scatter: Navigate through collected points for easier exploration
    and audio listening
  • Front/ScatterNew: Migrate to plotly
    • onLoad: Rewrite meta selection to avoid iterating through properties and
      consuming time
      • Performance is good once loaded
    • Filter/Meta: Attach
    • Filter/Time: Attach
  • Front: User can not currently switch reducers when one has already been loaded
  • Front/Scatter: Fix plotly on load with metas rewrite
  • Front/Scatter: Add trajectories
    • Add no trajectories case
    • Add multiple trajectories case
  • Front/Heatmaps: Make boxes clickable to apply corresponding filters
    • Use meta labels already provided to plotly heatmaps (matrices, pairings)
    • Requires rewrite of meta selection
  • Front/Meta: Sort autocluster
  • Front/Pairings: Fix autocluster sort
    • Processing: Sort from there?
    • Front: Sort for UI only?
      • Index map is not stored

Questions about timeline strategy

How would you prefer specifying the starting date for integration?

  1. Selected range
  2. New specific setting

I vote for 2.

Timeline Strategy recap

  • Instead of referring to files, we now refer to a timeframe
  • If an integrated portion of time is empty, we drop it
  • If an integrated portion of time is partially filled with audio, we keep it

TODOs timeline

  • Processing/Groups: Add
  • Processing/Indicators: Adapt to timeline
  • Processing/Volumes: Adapt to timeline
  • Processing/Matrices: Adapt to timeline
  • Processing/Pairings: Adapt to timeline
  • Front: Show file site as reference instead of file name or file index
    • A Group (integrated interval / aggregation) can now reference multiple files
  • Front/Export: Add indicators
  • Front/Export: Add volumes
  • Front/Histogram: Show indicators (for single points) over x (time, ...)

TODOs meeting

Read upcoming documentation

  • App: Add pnpm dev:audio with concurrent processes to avoid opening 2 terminals
    • Example: pnpm dev:audio sample-lana/audio
  • App: When launching processing command, ask for python or python3
    • App: Add test:python for the user to test its installed version
    • Doc: Add note about using python instead of python3 so make sure to
      have an alias
  • Doc: Add procedure
    • Assume users have no knowledge with software
    • Doc: Add manual installation procedure
    • Doc: Add UNIX testing procedure
    • Doc: Add Windows testing procedure
    • Doc: Add PYTHONPATH
      • Appending processing path to PYTHONPATH programatically
    • Doc: Add Docker installation procedure
    • Doc: Add executiong policy for Windows
  • Doc: Add migration command example
  • Doc: Add conda documentation
  • Doc: Retrieve user manual from Nicolas
  • Doc: Add documentation for Docker with CUDA
    • It needs specific installation, link the official related documentation
  • App: Add PYTHONPATH to processing commands
  • Doc: Prevent out of support v7 h5 files
  • Doc: Add documentation for mandatory reprocessing for upcoming v10
    • Removed migration documentation as difference is too big and we are
      prior first actual release.
  • Doc: Add console outputs
  • Doc: Add screenshots
  • Doc: Add screencasts
  • Doc: Add action workflow for global actions
  • Processing: Handle audio files of different durations
  • Processing/Extraction: Improve writing after each file extraction
    • Investigate other possible collisions
    • Verify with Nathan on Corridor's campaign
    • Do this for groups (integration) as well
  • Processing/Config: Investigate if we can merge base_path and audio_path
    • Processing/Config: Verify in Docker environment
  • Processing/Config: Add seconds to ranges and timestamps
  • Processing/Action: Add console feedback for autoclustering
  • Processing/Config: Force storing strings for sites and avoid numbers
  • Processing/Docker: Verify Docker start up script
  • Processing/TODO: Force uppercase for meta properties from configuration file
  • Processing/FIX: Manage audio files of different lengths
  • Processing/FIX: Integrating with 1 second from files of 40 seconds can resultin
    in group indexes above 40 (like 60...), thus no audio can be played from
    • Reproduce and fix the collision
    • Should be fixed with incremental writing to h5
  • Front: Adapt to new storage shape
    • Different audio file lengths
  • Front: User should be able to change audio_host setting
  • Front: Remove AppButton from AppModals because it does not render on Safari
    • Verify on MacBook
  • Front/FIX: Optional AUTOCLUSTER make color rendering unexpected
    • Rémi: With no autocluster, all dots are black
    • This should be fixed with using plotly
  • Front/FIX: Downloading audio file from 64kHz actually gives an audio of 48kHz
    • Reproduce
  • Front/AppDraggable: Add viewport visibility check on opening to reset position
    if out of bounds
  • Front/AppDraggable: Fix selected behaviour to display on top of other draggables
  • Frederic: Send Lana complete v9 campaign just for testing
  • Finances/June: Invoice
  • Finances/September: Rab de 3 mois après 1 mois de carrence
  • Processing/Extractor: Add validation check to prevent block
    (see traceback below)
    • Fixed by writing incrementally to h5 when extracting features
Traceback (most recent call last):
  File "/home/sf39231h/git/sound-scape-explorer/processing/processing/actions/_all.py", line 15, in <module>
    run_files(env)
  File "/home/sf39231h/git/sound-scape-explorer/processing/processing/actions/run_files.py", line 30, in run_files
    extractor.yield_and_store_features(
  File "/home/sf39231h/git/sound-scape-explorer/processing/processing/extractors/ConfigFilesExtractor.py", line 119, in yield_and_store_features
    storage.write_features(
  File "/home/sf39231h/git/sound-scape-explorer/processing/processing/storage/Storage.py", line 630, in write_features
    flat_features.append(features[f][s])
IndexError: list index out of range

Notes Rémi

  • Specify docker audio_path environment variable
    • Might create a specific project.env for each docker flavor
      • This was actually already done
  • Docker: Verify chown of h5
    • Verify RW from Docker after chown
    • Suggest Rémi's scripts for Docker generated files ownership
  • Add nodejs installation
  • Processing/Docker: Try to add libmp3lame-dev to process mp3 inside Docker
    • Consider libavcodec-dev
    • This might be unnecessary with the use of new package pydub
  • Processing/Indicators: soundfile python package does not support mp3 (libsndfile)
  • Processing/FIX: h5 file does not grow after GPU usage
    • Implement correct writing behaviour
  • Choose another port for servers?
    • More unusual, to have kind of a signature and avoid conflicts
    • Front: 5530
    • Audio: 5531
  • Front/ENH: Remove lightning button on Indicators/Matrices/Pairings
    and replace with onChange
  • Front/ENH: Fill background color of button when its modal is open
  • Front/Histogram: Sort autocluster series because labels are not in order
    • Fixed by sorting autocluster labels upon reading h5

Notes business

  • que 30 points sur 1h avec 60s integration ? à vérifier de mon coté...
  • timeline bizarre 3600 (qui miss 1j sur 2) vs 7200,
    probablement dû à la résolution des blocks de la barre?

TODOs from JR implementation

  • App/ProjectEnd: Save all dependencies from npm
    • Docker actually resolves this issue
  • Front/Histograms: Add legend (scale)
    • Was already implemented, did I get this wrong during the meeting?
  • Processing/Storage: Add computation dimensions
    • Store n calculation UMAPs
      • Provide purge command
    • Store calculation averaged matrix of distances
      • Attest size and lighten if necessary
        • This should not be an issue
      • Provide purge command
  • Processing/Config: Add autocluster pane
    • Processing/Config: Add documentation
  • Processing/Config: Add hdbscan settings (described above)
  • Processing/Autocluster: Adapt to new algo input
    (averaged matrix of distances replacing UMAP features)
  • Processing/Action: Add trajectories action
  • Processing: Add better raise messages
  • Dataframe is not exported correctly (autoclusters)
  • Front: front is ok about indicators and derived values, problem in writing?
  • Processing/Config: ConfigReducer refacto, picking, reconstruct for actual reductions
  • Processing/Storage: Reduced features do not get written in h5
  • Processing/Derived: Can have bad shape
    • Indicators
    • OverlapMatrix
  • App/CD: Processing pipeline is failing

TODOs during implementation

  • Processing/Config: Make ConfigSettings object for better setters and getters
    • Does not seem to be needed
  • Processing/Derived: Remove recursiveness for derived values
    • Indicators
    • Volumes
    • Matrices
    • Pairings
    • Reducers
  • Processing/Config: Create class for metas?
  • Processing/Trajectories: Diverge from JR implementation in the way of
    feeding features dataframe
  • Front/NPM: Remove unused dependencies like three
  • Front/Scatter: Remove and replace with ScatterNew
  • Processing/Groups: Current timeline strategy is wrong
    • It takes all audio within the interval and merges its VGG features
      to mean
    • We should take all audios for a given single site value, But we
      also should get n means for n sites within the same interval
    • Need to store file and group indexes in order to keep flat storage
      otherwise this will be very messy
  • Storage(Front/Processing): Rename new storage path values for consistency
    • Stalled while running processing on big project
  • Front/Trajectories: Add coloring based on timestamps
  • Processing/Dataframe: Verify
  • Processing/Config: Remove setting group_start
    • timestamp_start is now programatically computed

Project delays

List of reasons for the late deliveries:

  • Project specifications modification from existing codebase
    • More refactoring was needed
    • More time spent analysing all available options
  • Optimistic estimations
    • With new grouping and site handling

Commits containing BREAKING CHANGEs

From most recent to oldest.

@bamdadsabbagh bamdadsabbagh self-assigned this May 31, 2023
@bamdadsabbagh bamdadsabbagh added the enhancement New feature or request label May 31, 2023
@bamdadsabbagh bamdadsabbagh added this to the v10 milestone May 31, 2023
@bamdadsabbagh bamdadsabbagh force-pushed the next/v10 branch 3 times, most recently from 56f8b0e to 7c76c45 Compare July 8, 2023 14:23
@bamdadsabbagh
Copy link
Collaborator Author

bamdadsabbagh commented Jul 11, 2023


id: "20230711-SSE-meeting-implementing-new-JR-method"
aliases:

  • "SSE Meeting: Implémentation nouvelle méthode JR"
    tags:
  • "sse"
  • "dev"
  • "pro"
  • "sound scape explorer"
    date: 20230711

SSE Meeting: Implémentation nouvelle méthode JR

[toc]

Liens utiles

Présentation Jeremy

Avec la nouvelle implémentation, on change la méthode de calcul pour les :

  • Silhouettes
  • Autocluster
  • Trajectoires (nouveau)

En effet, dans un souci de validation de la méthode et d'augmentation des
performances de calculs, ces changements sont la bienvenue.

Méthode

En partant des données générées par VGGish, on génère n UMAPs avec une seed random.

Note

Pour le papier, nous prendrons un nombre n d'UMAPs à générer de `100.

Mais en pratique, 50 itérations suffisent pour avoir un taux de différence < 1%.

En effet, la convergence des UMAPs est assez rapide (les itérations permettant
de réduire la randomness de cet algorithme).

Le nombre de dimensions demandées ne doit être, ni trop élevé (supérieur à ~10),
ni trop faible (inférieur à ~3).

Pour la campagne de Lana, un sweet spot de 5 dimensions a été déterminé.

Note

En dessous de 5 dimensions, on estime que trop de données sont perdues.

A partir de ces n UMAPs 5D, on fait :

  • La génération de 1 trajectoire
    • En moyennant les coordonnées de chacun des UMAPs puis traçant la trajectoire.
    • De cette trajectoire, on en tire les indicateurs propres aux trajectoires
      qui permettront de les comparer entre elles (la moyenne des distances
      et les quartiles à 95%).
    • On peut ajouter les percentiles à 5% et afficher en tracé type candlestick.
  • La génération de n matrices de distances donnant naissance à 1 seule et unique
    matrice de distances moyennées.
    • Cette matrice de distances sera alors utilisée pour la génération des indices
      Silhouette et Autocluster.

Data flow

  • Files (VGGish)
  • Groups (Integration)
  • Robust scale (intégré dans le calcul des UMAPs)
  • n UMAP 5D
    • random_seed: None
    • min_dist: 0
    • distance_metric: manhattan
  • Matrice distances
    • metric: euclidean

Note

Pour rappel, les UMAPs utilisés pour l'affichage (2D, 3D) utilisent
les paramètres suivants :

  • random_seed: 42000
  • min_dist: 0.1
  • distance_metric: manhattan

New configuration settings

  • UMAP dimensions: 5
  • UMAP iterations: 50
  • hdbscan
    • min_cluster_size: 15
    • min_samples: None (min_cluster_size)
    • alpha: 1
    • epsilon: 0.1
    • algo: eom | leaf

See TODOs for new Autocluster panel

Publishing setings

  • UMAP dimensions: 5
  • UMAP iterations: 100
  • hdbscan
    • min_cluster_size: 50 | 100
      • To be determined by Jeremy Rouch & Lana Minier
    • min_samples: None
    • alpha: 1
    • epsilon: 0
    • algo: eom & leaf

TODOs

See [[20230609-SSE-meeting-v10]]

@bamdadsabbagh bamdadsabbagh force-pushed the next/v10 branch 13 times, most recently from 1a55af9 to 6968128 Compare July 17, 2023 20:56
@bamdadsabbagh bamdadsabbagh force-pushed the next/v10 branch 9 times, most recently from f848b0e to 028caa5 Compare July 21, 2023 15:01
@sonarcloud
Copy link

sonarcloud bot commented Aug 31, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 11 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@bamdadsabbagh bamdadsabbagh merged commit a269d98 into main Aug 31, 2023
8 checks passed
@bamdadsabbagh bamdadsabbagh deleted the next/v10 branch August 31, 2023 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant