<h3><font color='#f54251'> General information</font></h3>

    Global Navigation Satellite System (GNSS) provides raw signals, which the GPS chipset uses to compute a position. Current mobile phones only offer 3-5 meters of positioning accuracy. While useful in many cases, it can create a “jumpy” experience. For many use cases the results are not fine nor stable enough to be reliable. Machine learning algorithms and precision GNSS algorithms are expected to improve the accuracy and provide billions of Android phone users with a more fine-tuned positioning experience.
        

<h3><font color='#f54251'> Data description</font></h3>

    We will use data collected from the host team’s own Android phones to compute location down to decimeter or even centimeter resolution, if possible. we will have access to precise ground truth, raw GPS measurements, and assistance data from nearby GPS stations, in order to train and test our submissions.
    
    This challenge provides data from a variety of instruments useful for determining a phone's position: signals from GPS satellites, accelerometer readings, gyroscope readings, and more.

    As this challenge’s design is focused on post-processing applications such as lane-level mapping, future data along a route will be available to generate positions as precisely as possible. You may also make use of information from neighboring phones to aid your estimation, as many routes may be represented by multiple phones. In order to encourage the development of a general GNSS positioning algorithm, in-phone GPS chipset locations will not be provided, as they are derived from a manufacturer proprietary algorithm that varies by phone model and other factors.
    
       Please check this link : https://www.kaggle.com/c/google-smartphone-decimeter-challenge/data to know more about data.


<h3><font color='#42f5c5'>Exploring data</font></h3>

<h3><font color='#42d7f5'>Reading data</font></h3>

In [None]:
import pandas as pd # pandas is used to read and manipulate data 
import os
data_path = '/kaggle/input/google-smartphone-decimeter-challenge' # specifying the data path
print(os.listdir(data_path))

In [None]:
train_location = pd.read_csv(data_path+'/baseline_locations_train.csv') # read_csv is used to read the csv files
test_location = pd.read_csv(data_path+'/baseline_locations_test.csv')

In [None]:
train_location.columns # .columns is used to print features in our dataset

        We can see features such as latitude and longitude are present

In [None]:
train_location.head() # head is used print first five rows in data

        Latitude and longitude are present in decimal degress

<h3><font color='#42d7f5'>Checking for nan values</font></h3>

In [None]:
train_location.isna().any() # isna is used to check for nan values

    There are no null values in our datasets 
    
    check this link : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isna.html for more details

<h3><font color='#42d7f5'>Exploring collectionName features</font></h3>

        collectionName is the name of the grandparent folder

In [None]:
train_location.collectionName.nunique() # nunique is used to find the total number of unique items 

    There are around 29 folders 

In [None]:
import plotly.express as pe # plotly.express is used for beautiful visualization
parent_folder = train_location.groupby("collectionName")['collectionName'].count().reset_index(name = 'Count of files in each folder')
bar_chart = pe.bar(parent_folder, x='collectionName', y='Count of files in each folder', title = 'Number of files in each folder',color_discrete_sequence =['orange']*len(train_location))
bar_chart.show()

        Most of the parent folders has more than 4000 files , 2021-01-04-us-rwc-1 file has most number of files 
        
        check below links
        
        *. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
        *. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.count.html

<h3><font color='#42d7f5'>Exploring phoneName features</font></h3>

In [None]:
train_location.phoneName.nunique() # nunique is used to find the total number of unique items 

    There are 7 unique phones used for this data collection

In [None]:
import plotly.express as pe # plotly.express is used for beautiful visualization
phones = train_location.groupby('phoneName')['phoneName'].count().reset_index(name='count of phone used')
pie_chart = pe.pie(phones, values='count of phone used', names='phoneName', title='Phones used in data collection')
pie_chart.show()

        Pixel4 phone is used most , around 36.7 % to collect the data.

<h3><font color='#42d7f5'>Exploring latitude and longitude features</font></h3>

In [None]:
# Reference code : https://www.kaggle.com/tomwarrens/first-eda-with-geocoordinates  by Tommaso Guerrini
import folium # folium is easy to visualize
maps = folium.Map(location=[37.453128,-122.154313], tiles='openstreetmap', zoom_start = 10) # creating a map within the specified locations
sample_locations = train_location.sample(200).reset_index(drop = True) # randomly sampling 1000 data points
for j in range(len(sample_locations)): # for every data point 
    try:
        folium.Marker(location=[sample_locations['latDeg'][j], # marking the location
                                sample_locations['lngDeg'][j]],
                        popup=sample_locations['collectionName'][j],
                        icon = folium.Icon(prefix = 'fa', icon = "map-pin", color = 'blue'),
                        fill_color='#132b5e', num_sides=3, radius=5).add_to(maps)
    except:
        continue
        
maps

        Most of the locations are in us only.

<h3><font color='#32a89e'>More work in progress, If you like my kernel , please upvote it !....</font></h3>