Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/grab musicians from el recodo #146

Merged
merged 141 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
141 commits
Select commit Hold shift + click to select a range
f7e90ac
initial commit for grab musicians from el recodo
jmarsh24 Jul 15, 2024
7012fab
wip
jmarsh24 Jul 16, 2024
f2d90f6
correct file anme
jmarsh24 Jul 17, 2024
df18048
add person roles
jmarsh24 Jul 17, 2024
e559c0d
add credentials
jmarsh24 Jul 17, 2024
556c8ed
add
jmarsh24 Jul 17, 2024
444e664
add migrations
jmarsh24 Jul 17, 2024
2273f3f
Merge branch 'main' into feature/grab_musicians_from_el_recodo
jmarsh24 Jul 17, 2024
c68b26e
add models
jmarsh24 Jul 17, 2024
94a6c85
person and song scraper working
jmarsh24 Jul 17, 2024
aa00717
working auth
jmarsh24 Jul 17, 2024
2f6d5b2
song scraper working
jmarsh24 Jul 17, 2024
28dbd4a
faraday raises error
jmarsh24 Jul 17, 2024
9d10117
working role_manager
jmarsh24 Jul 18, 2024
4f39ada
spec working slighly more
jmarsh24 Jul 18, 2024
7a1c2f2
things are green!
jmarsh24 Jul 18, 2024
c9c95c9
encode all urls
jmarsh24 Jul 18, 2024
4ce231e
fix
jmarsh24 Jul 18, 2024
5a73c71
add spec for biagi
jmarsh24 Jul 18, 2024
0e4e308
fix broken
jmarsh24 Jul 18, 2024
60bb523
make person scraper more robust
jmarsh24 Jul 18, 2024
e013c0a
add intrumental test
jmarsh24 Jul 18, 2024
673c46d
fix attachment
jmarsh24 Jul 18, 2024
6e0acc2
role manager working
jmarsh24 Jul 18, 2024
f096f3e
fix deprecation error
jmarsh24 Jul 18, 2024
fc39153
rake task works
jmarsh24 Jul 18, 2024
b6570c6
add
jmarsh24 Jul 18, 2024
f2f676c
add songs
jmarsh24 Jul 18, 2024
b41d670
debugging with justin
dogacozen Jul 18, 2024
92d07e7
Merge branch 'feature/grab_musicians_from_el_recodo' of https://githu…
dogacozen Jul 18, 2024
3526051
Merge branch 'feature/grab_musicians_from_el_recodo' of https://githu…
dogacozen Jul 18, 2024
beaed32
add avo admin
jmarsh24 Jul 18, 2024
afbe324
split on y
jmarsh24 Jul 18, 2024
74b69fa
fix ids
jmarsh24 Jul 18, 2024
9e64630
add faraday and redis
jmarsh24 Jul 18, 2024
669b481
add throttler
jmarsh24 Jul 18, 2024
977477d
add redis
jmarsh24 Jul 18, 2024
10d1930
workig
jmarsh24 Jul 18, 2024
70fe8d7
working
jmarsh24 Jul 19, 2024
b592276
sync songs jobs spec
jmarsh24 Jul 19, 2024
2d27cca
working sleep
jmarsh24 Jul 19, 2024
af0b3b4
convert to integer
jmarsh24 Jul 20, 2024
f307650
change role names to match html values
dogacozen Jul 20, 2024
742e175
Merge branch 'feature/grab_musicians_from_el_recodo' of https://githu…
jmarsh24 Jul 20, 2024
c8b3e9c
push
jmarsh24 Jul 20, 2024
277d9f8
allow auth to autogenerate
jmarsh24 Jul 20, 2024
72d2298
add cello role
jmarsh24 Jul 20, 2024
e673a78
update to 30 seconds
jmarsh24 Jul 20, 2024
7ed9dba
switch to orchestra name
jmarsh24 Jul 21, 2024
d0df7aa
use this locally
jmarsh24 Jul 21, 2024
3e4ca3a
ert_number
jmarsh24 Jul 21, 2024
89741a6
total songs
jmarsh24 Jul 21, 2024
f8a26ed
annotate
jmarsh24 Jul 21, 2024
4e8b732
fix syncronizer
jmarsh24 Jul 21, 2024
8cc6abd
remove binding
jmarsh24 Jul 21, 2024
80934b1
fix roles
jmarsh24 Jul 21, 2024
48b78b6
remove binding
jmarsh24 Jul 21, 2024
19f6431
first minor refacto on song scraper
dogacozen Jul 21, 2024
9857c08
minor refacto on song scraper tests
dogacozen Jul 21, 2024
f1d2b38
minor refacto on song scraper tests
dogacozen Jul 21, 2024
d46d36a
update test
jmarsh24 Jul 21, 2024
3e16974
fix
jmarsh24 Jul 21, 2024
e2a0cb8
broken
jmarsh24 Jul 21, 2024
2988aaf
allow scrapers to be run on their own
jmarsh24 Jul 21, 2024
254f224
push latest
jmarsh24 Jul 21, 2024
1f23a40
fix
jmarsh24 Jul 21, 2024
3427b55
remove comments
jmarsh24 Jul 21, 2024
310bb97
add progress
jmarsh24 Jul 21, 2024
d72561f
oops
jmarsh24 Jul 21, 2024
9fac9e2
fix
jmarsh24 Jul 21, 2024
15458d7
cleanup
jmarsh24 Jul 21, 2024
0bd7921
fix www
jmarsh24 Jul 21, 2024
4d14bdb
remove comment
jmarsh24 Jul 21, 2024
1277e68
hide synced at on index
jmarsh24 Jul 21, 2024
f6e41e4
fix date parse
jmarsh24 Jul 21, 2024
17b30da
fix
jmarsh24 Jul 21, 2024
f8f5d5b
don't reimport the same ert_number twice if empty page
jmarsh24 Jul 21, 2024
a8d6e59
working
jmarsh24 Jul 21, 2024
46c51a6
should return nil
jmarsh24 Jul 21, 2024
bf772f2
and missing migrations
jmarsh24 Jul 21, 2024
bd90934
scraper
jmarsh24 Jul 21, 2024
5ef30c6
fix
jmarsh24 Jul 21, 2024
dda71c4
rescue nil
jmarsh24 Jul 21, 2024
7096f9f
rubocop -A
jmarsh24 Jul 21, 2024
1b8059d
add sleep after twoo many requests error
jmarsh24 Jul 21, 2024
cd7b364
add orchestra and image to import
jmarsh24 Jul 22, 2024
96f3d15
fix bug
jmarsh24 Jul 22, 2024
6c082a6
update schema
jmarsh24 Jul 22, 2024
2206fdc
properly install avo admin and pundit policies
jmarsh24 Jul 22, 2024
3b91b49
metaprogram el recodo song
jmarsh24 Jul 22, 2024
22a7d04
add solid cache migrations
jmarsh24 Jul 22, 2024
03323d3
working dropdown
jmarsh24 Jul 22, 2024
10c3897
add dropdown
jmarsh24 Jul 22, 2024
5f01ec3
make dynamic
jmarsh24 Jul 22, 2024
62fce19
remove search data
jmarsh24 Jul 22, 2024
7a65d76
add hidden controller
jmarsh24 Jul 22, 2024
04cd50d
search working
jmarsh24 Jul 22, 2024
c68ce8d
adjust styling
jmarsh24 Jul 22, 2024
20d8f61
add retry to auth
jmarsh24 Jul 22, 2024
efdee72
require the retry
jmarsh24 Jul 22, 2024
53b12de
update
jmarsh24 Jul 22, 2024
90333c4
update index
jmarsh24 Jul 22, 2024
95632b1
wip
jmarsh24 Jul 22, 2024
482c014
use the song_builder
jmarsh24 Jul 22, 2024
54b39dc
fix
jmarsh24 Jul 22, 2024
4ab5fd4
remove items
jmarsh24 Jul 22, 2024
116a622
fix
jmarsh24 Jul 22, 2024
c783696
hide on index
jmarsh24 Jul 22, 2024
0c5735f
rubocop
jmarsh24 Jul 22, 2024
6f7c2b0
capitalize
jmarsh24 Jul 22, 2024
bd3571a
update
jmarsh24 Jul 22, 2024
9150338
make nullable
jmarsh24 Jul 22, 2024
1ce7f6d
remove unique index
jmarsh24 Jul 22, 2024
e4f5bb9
rename rake task
jmarsh24 Jul 22, 2024
548897b
remove deploy and app ymls
jmarsh24 Jul 22, 2024
fa130db
test revised
jmarsh24 Jul 22, 2024
2330c36
drop migrations
jmarsh24 Jul 22, 2024
c3e7f52
delete db
jmarsh24 Jul 22, 2024
57e96fc
refactor ci
jmarsh24 Jul 22, 2024
7dbdaf4
fix yarn
jmarsh24 Jul 22, 2024
b8a11f6
create elasticsearch url
jmarsh24 Jul 22, 2024
a6d83a1
add working directory
jmarsh24 Jul 22, 2024
2dea744
remove
jmarsh24 Jul 22, 2024
c1ed62c
remove
jmarsh24 Jul 22, 2024
2ebbd65
yay
jmarsh24 Jul 22, 2024
56dac03
add rover
jmarsh24 Jul 22, 2024
b7976fb
graphql
jmarsh24 Jul 22, 2024
7609040
working temporary
jmarsh24 Jul 22, 2024
0c489be
lint yarn
jmarsh24 Jul 22, 2024
213896c
allow 127.0.0.1
jmarsh24 Jul 22, 2024
0509a77
fix
jmarsh24 Jul 22, 2024
a4d6b11
if blank
jmarsh24 Jul 22, 2024
c8e0e9f
5 seconds
jmarsh24 Jul 22, 2024
77798c0
test
jmarsh24 Jul 22, 2024
29631dd
Merge branch 'main' into feature/grab_musicians_from_el_recodo
jmarsh24 Jul 22, 2024
4107130
rubocop
jmarsh24 Jul 22, 2024
607bd7c
remove duplicate migration
jmarsh24 Jul 22, 2024
3647cf9
remove duplicate migrations
jmarsh24 Jul 22, 2024
79cfe7a
reording
jmarsh24 Jul 22, 2024
40e3105
working migration
jmarsh24 Jul 22, 2024
931dd79
change to 10 seconds by default
jmarsh24 Jul 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions api/Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -215,12 +215,11 @@ GEM
railties (>= 5.0.0)
faker (3.4.1)
i18n (>= 1.8.11, < 2)
faraday (2.9.2)
faraday-net_http (>= 2.0, < 3.2)
faraday (1.2.0)
multipart-post (>= 1.2, < 3)
ruby2_keywords
faraday-multipart (1.0.4)
multipart-post (~> 2)
faraday-net_http (3.1.0)
net-http
ffi (1.17.0-aarch64-linux-gnu)
ffi (1.17.0-arm64-darwin)
ffi (1.17.0-x86_64-linux-gnu)
Expand Down Expand Up @@ -381,8 +380,6 @@ GEM
multipart-post (2.4.1)
mutex_m (0.2.0)
nenv (0.3.0)
net-http (0.4.1)
uri
net-imap (0.4.13)
date
net-protocol
Expand Down Expand Up @@ -581,6 +578,7 @@ GEM
ruby-progressbar (1.13.0)
ruby-vips (2.2.1)
ffi (~> 1.12)
ruby2_keywords (0.0.5)
searchkick (5.3.1)
activemodel (>= 6.1)
hashie
Expand Down Expand Up @@ -654,7 +652,6 @@ GEM
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
unicode-display_width (2.5.0)
uri (0.13.0)
view_component (3.12.1)
activesupport (>= 5.2.0, < 8.0)
concurrent-ruby (~> 1.0)
Expand Down
57 changes: 38 additions & 19 deletions api/app/models/external_catalog/el_recodo/song_scraper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,17 @@ class PageNotFoundError < StandardError; end
:page_updated_at
).freeze

def initialize(music_id:)
@music_id = music_id
def initialize(email:, password:)
@cookie = login(email:, password:)
end

def metadata
def fetch(music_id:)
@music_id = music_id
binding.irb
Metadata.new(
date:,
ert_number:,
music_id: @music_id,
music_id:,
title:,
style:,
orchestra:,
Expand All @@ -49,9 +51,29 @@ def metadata

private

def login(email:, password:)
response = Faraday.post("https://www.el-recodo.com/connect?lang=en") do |req|
req.headers["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15"
req.headers["Content-Type"] = "application/x-www-form-urlencoded"
req.headers["Accept"] = "*/*"
req.headers["Connection"] = "keep-alive"

req.body = {
"wish" => "logged",
"email" => email,
"pwd" => password,
"autologin" => "1",
"backurl" => ""
}
end

response.headers["Set-Cookie"]
end

def faraday
@faraday ||= Faraday.new do |conn|
conn.headers["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15"
conn.headers["Cookie"] = @cookie if @cookie
end
end

Expand Down Expand Up @@ -136,24 +158,21 @@ def safe_parse_date(date_string)
end

def parse_html
@parsed_html ||= Nokogiri::HTML(fetch_page.body)
page = faraday.get("https://www.el-recodo.com/music?id=#{@music_id}&lang=en")
@parsed_html ||= Nokogiri::HTML(page.body)
end

def fetch_page
begin
response = faraday.get("https://www.el-recodo.com/music?id=#{@music_id}&lang=en")
if response.status == 429
Rails.logger.error("El Recodo Song Scraper: Too Many Requests")
raise TooManyRequestsError
elsif response.status != 200
Rails.logger.error("El Recodo Song Scraper: Page Not Found")
raise PageNotFoundError
end
rescue Faraday::Error => e
Rails.logger.error("El Recodo Song Scraper: #{e.message}")
raise
def extract_musicians
musicians = {}
instruments = ["PIANO", "DOUBLEBASS", "BANDONEON", "VIOLIN", "ARRANGER"]

instruments.each do |instrument|
instrument_data = parse_html.at_xpath("//text()[contains(.,'#{instrument}')]").parent
musician_links = instrument_data.css("a")
musicians[instrument.downcase.to_sym] = musician_links.map { |link| {name: link.text, url: link["href"]} }
end
response

musicians
end
end
end
Expand Down
3 changes: 3 additions & 0 deletions api/config/initializers/faraday.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Faraday.default_connection = Faraday.new do |conn|
conn.response :raise_error
end
3 changes: 3 additions & 0 deletions api/db/migrate/20240114234658_create_el_recodo_songs.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,10 @@ def change
t.string :label
t.jsonb :members, null: false, default: "{}"
t.text :lyrics
t.integer :lyrics_year
t.string :search_data
t.string :matrix
t.string :disk
t.index :ert_number
t.index :music_id, unique: true
t.index :synced_at
Expand Down
17 changes: 17 additions & 0 deletions api/db/migrate/20240114234659_create_el_recodo_musicians.rb
jmarsh24 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
class CreateElRecodoPersons < ActiveRecord::Migration[7.1]
def change
create_table :el_recodo_persons, id: :uuid do |t|
t.date :birth_date
t.date :death_date
t.string :real_name
t.string :nicknames, array: true
t.string :place_of_birth
t.string :url
t.string :image_url
t.datetime :synced_at, null: false, default: -> { "CURRENT_TIMESTAMP" }
t.datetime :page_updated_at, null: false, default: -> { "CURRENT_TIMESTAMP" }

t.timestamps
end
end
end
8 changes: 4 additions & 4 deletions api/spec/models/import/el_recodo/song_scraper_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
describe "#metadata" do
context "for normal songs" do
before do
music_1_html = Rails.root.join("spec/fixtures/html/el_recodo_music_id_1.html")
stub_request(:get, "https://www.el-recodo.com/music?id=1&lang=en")
.to_return(status: 200, body: File.read(music_1_html))
# music_1_html = Rails.root.join("spec/fixtures/html/el_recodo_music_id_1.html")
# stub_request(:get, "https://www.el-recodo.com/music?id=1&lang=en")
# .to_return(status: 200, body: File.read(music_1_html))
end

it "fetches and parses song metadata correctly" do
metadata = described_class.new(music_id: 1).metadata
metadata = Import::ElRecodo::SongScraper.new(email: "dogacozen87@gmail.com", password: "myNewPass123").fetch(music_id: 3573)

expect(metadata.ert_number).to eq(1)
expect(metadata.title).to eq("Te burlas tristeza")
Expand Down
6 changes: 3 additions & 3 deletions api/spec/rails_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@
config.infer_spec_type_from_file_location!
config.filter_rails_from_backtrace!

config.before(:suite) do
WebMock.disable_net_connect!(allow: ["elasticsearch", "localhost"])
end
# config.before(:suite) do
# WebMock.disable_net_connect!(allow: ["elasticsearch", "localhost"])
# end

config.before(:each) do
ActiveStorage::Current.url_options = {host: "example.com"}
Expand Down