Skip to content

fspiri/Data_helper_suite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

the Scope

automate the process of data gathering and organization prior to Machine Learning with easy to use snippets.


rename_snippet.py


guidelines

place the snippet in the same folder as your train AND/OR val folders.

              ┌ rename_snippet.py
img_dataset ──┼ train ─┬ cats
              │        ├ lamas
              │        ╎
              │
              └ val ─┬ cats
                     ├ lamas
                     ╎

run with python rename_snippet.py
supported formats: .jpg .jpeg .png but easy to modify

stats

Time Complexity: $\Theta(n)$ for $n =$ elements in the folder
Space Complexity: $\Theta(1)$
Parallelism not yet implemented

possible bugs scenarios

Interrupting the process early. Partially processed files can be found in a temporary folder.



image_scraper_snippet.py


guidelines

place the snippet in the folder above your train AND/OR val folders.

                ┌ image_scraper_snippet.py
──img_dataset ──┼ train ─┬ cats
                │        ├ lamas
                │        ╎
                │
                └ val ─┬ cats
                       ├ lamas
                       ╎
  • requirements:
    • have Chrome installed on your system. Will be used as guest, no log-in needed.
    • have selenium installed. If not just run pip install selenium
  • run with python rename_snippet.py.
  • the snippet will create a downloads folder in which all the queries will be downloaded.
  • the downloaded images are small, adapt for machine learning.
  • don't forget to turn on variable size analysis in your model, as the images come in a range of sizes.

stats

Time Complexity: $<= 0.10 sec$ for $image$ - after the browser has been opened
Parallelism implemented - Concurrent downloads

possible bugs scenarios

Most common bug: Chrome doesn't load properly / Loads with different settings.

  • implemented solution: the program will try to re-open the browser for a max of 3 times. This solution doesn't always work.
  • user solution: Be stubborn. Re-run the program until it works. Usually 2 re-run at max will do it.


Releases

No releases published

Packages

No packages published

Languages