Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geocoding function #1

Open
new-village opened this issue Jan 14, 2023 · 4 comments
Open

Add geocoding function #1

new-village opened this issue Jan 14, 2023 · 4 comments

Comments

@new-village
Copy link
Owner

new-village commented Jan 14, 2023

Add latitude and longitude data from concatinated "prefecture_name", "city_name" and "street_number".

@new-village
Copy link
Owner Author

First, we should confirm to use "en_city_name" field.
If it is OK, we will get high accuary geocoding data from Open Street Map or other foreign geocoding services.

@new-village
Copy link
Owner Author

new-village commented Jan 15, 2023

we don't use english address info because en_city_name is only 0.2% registration in Shimane prefecture.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21927 entries, 0 to 21926
Data columns (total 30 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   sequence_number             21927 non-null  object
 1   corporate_number            21927 non-null  object
 2   process                     21927 non-null  object
 3   correct                     21927 non-null  object
 4   update_date                 21927 non-null  object
 5   change_date                 21927 non-null  object
 6   name                        21927 non-null  object
 7   name_image_id               354 non-null    object
 8   kind                        21927 non-null  object
 9   prefecture_name             21927 non-null  object
 10  city_name                   21927 non-null  object
 11  street_number               21862 non-null  object
 12  address_image_id            88 non-null     object
 13  prefecture_code             21927 non-null  object
 14  city_code                   21927 non-null  object
 15  post_code                   21927 non-null  object
 16  address_outside             0 non-null      object
 17  address_outside_image_id    0 non-null      object
 18  close_date                  2174 non-null   object
 19  close_cause                 2174 non-null   object
 20  successor_corporate_number  172 non-null    object
 21  change_cause                275 non-null    object
 22  assignment_date             21927 non-null  object
 23  latest                      21927 non-null  object
 24  en_name                     64 non-null     object
 25  en_prefecture_name          64 non-null     object
 26  en_city_name                64 non-null     object
 27  en_address_outside          0 non-null      object
 28  furigana                    10648 non-null  object
 29  hihyoji                     21927 non-null  object
dtypes: object(30)
memory usage: 5.0+ MB

@new-village
Copy link
Owner Author

parse address info & result is here

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21927 entries, 0 to 21926
Data columns (total 37 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   sequence_number             21927 non-null  object 
 1   corporate_number            21927 non-null  object 
 2   process                     21927 non-null  object 
 3   correct                     21927 non-null  object 
 4   update_date                 21927 non-null  object 
 5   change_date                 21927 non-null  object 
 6   name                        21927 non-null  object 
 7   name_image_id               354 non-null    object 
 8   kind                        21927 non-null  object 
 9   prefecture_name             21927 non-null  object 
 10  city_name                   21927 non-null  object 
 11  street_number               21862 non-null  object 
 12  address_image_id            88 non-null     object 
 13  prefecture_code             21927 non-null  object 
 14  city_code                   21927 non-null  object 
 15  post_code                   21927 non-null  object 
 16  address_outside             0 non-null      object 
 17  address_outside_image_id    0 non-null      object 
 18  close_date                  2174 non-null   object 
 19  close_cause                 2174 non-null   object 
 20  successor_corporate_number  172 non-null    object 
 21  change_cause                275 non-null    object 
 22  assignment_date             21927 non-null  object 
 23  latest                      21927 non-null  object 
 24  en_name                     64 non-null     object 
 25  en_prefecture_name          64 non-null     object 
 26  en_city_name                64 non-null     object 
 27  en_address_outside          0 non-null      object 
 28  furigana                    10648 non-null  object 
 29  hihyoji                     21927 non-null  object 
 30  pref                        21927 non-null  object 
 31  city                        21927 non-null  object 
 32  town                        21927 non-null  object 
 33  addr                        21927 non-null  object 
 34  lat                         21682 non-null  float64
 35  lng                         21682 non-null  float64
 36  level                       21927 non-null  int64  

@new-village
Copy link
Owner Author

we confiremed that 245 of invalid address and unparsed data is in the Shimane data set.

3157	有限会社キリン洋傘店	島根県	松江市		白潟本町75-11	NaN	NaN	2
3176	三栄建設有限会社	島根県	松江市		上乃木町1980	NaN	NaN	2
3298	山新商事有限会社	島根県	松江市		上乃木町3244-2	NaN	NaN	2
3434	有限会社石昭	島根県	松江市		白潟本町69-3	NaN	NaN	2
3951	有限会社イワミ精機	島根県	鹿足郡吉賀町		柿木539-1	NaN	NaN	2
3952	有限会社斎藤石油店	島根県	鹿足郡吉賀町		柿木485-1	NaN	NaN	2
3973	有限会社小笠原	島根県	鹿足郡吉賀町		柿木642-2	NaN	NaN	2
4091	藤澤合名会社	島根県	松江市		None	NaN	NaN	2
4092	合資会社出雲合同自動車	島根県	松江市		None	NaN	NaN	2
4093	合資会社木村盛文館	島根県	松江市		None	NaN	NaN	2
4094	合資会社山陰秩父商会	島根県	松江市		None	NaN	NaN	2
4095	大栄商工合資会社	島根県	松江市		None	NaN	NaN	2
4096	松江商事合資会社	島根県	松江市		None	NaN	NaN	2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant