# String Operations

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/giswqs/geog-312/blob/main/book/python/04_string_operations.ipynb)

## Overview

This lecture will cover various string operations in Python, with a focus on their application in geospatial contexts. Strings are fundamental in handling textual data, such as names of geographic locations, coordinates, and data extracted from text files. Mastering string operations allows you to effectively manipulate and analyze geographic information, which is essential for tasks like data cleaning, formatting, and parsing.

## Learning Objectives

By the end of this lecture, you should be able to:

- Create and manipulate strings in Python, including concatenation and repetition.
- Apply string methods such as `lower()`, `upper()`, `strip()`, `replace()`, and `split()` to process geospatial data.
- Format strings using the `format()` method and f-strings to include variable data within strings.
- Parse and extract specific information from strings, such as coordinates or location names.
- Utilize string operations in practical geospatial tasks, enhancing your ability to work with and manage geographic data.

## Creating and Manipulating Strings

Strings in Python are sequences of characters. You can create a string by enclosing characters in single or double quotes.

In [2]:
location_name = "Mount Everest"  # A string representing the name of a location

You can concatenate (join) strings using the `+` operator:

In [4]:
location_name_full = location_name + ", Nepal"
print(location_name_full)

Mount Everest, Nepal


You can also repeat strings using the `*` operator:

In [5]:
separator = "-" * 10
print(separator)

----------


## String Methods for Geospatial Data

Python provides various built-in methods to manipulate strings. Some commonly used methods include:

- `lower()`, `upper()`: Convert strings to lowercase or uppercase.
- `strip()`: Remove leading and trailing whitespace.
- `replace()`: Replace a substring with another substring.
- `split()`: Split a string into a list of substrings based on a delimiter.

In [6]:
location_name_upper = location_name.upper()
print(location_name_upper)  # Convert to uppercase

MOUNT EVEREST


In [10]:
location_name_clean = location_name.strip()
print(location_name_clean)  # Remove leading/trailing whitespace
repr(location_name_clean)

Mount Everest


"'Mount Everest'"

In [11]:
location_name_replaced = location_name.replace("Everest", "K2")
print(location_name_replaced)  # Replace 'Everest' with 'K2'

Mount K2


In [12]:
location_parts = location_name_full.split(", ")
print(location_parts)  # Split the string into a list

['Mount Everest', 'Nepal']


## Formatting Strings

String formatting is essential when preparing data for output or when you need to include variable values in strings. You can use the `format()` method or f-strings (in Python 3.6 and above) for string formatting.

In [13]:
latitude = 27.9881
longitude = 86.9250
formatted_coordinates = "Coordinates: ({}, {})".format(latitude, longitude)
print(formatted_coordinates)

Coordinates: (27.9881, 86.925)


In [14]:
formatted_coordinates_fstring = f"Coordinates: ({latitude}, {longitude})"
print(formatted_coordinates_fstring)

Coordinates: (27.9881, 86.925)


## Parsing and Extracting Information from Strings

Often, you will need to extract specific information from strings, especially when dealing with geographic data. For example, you might need to extract coordinates from a formatted string.

In [15]:
coordinate_string = "27.9881N, 86.9250E"
lat_str, lon_str = coordinate_string.split(", ") #makes a list and then unpacks in a string
latitude = float(lat_str[:-1])  # Convert string to float and remove the 'N'
longitude = float(lon_str[:-1])  # Convert string to float and remove the 'E'
print(f"Parsed coordinates: ({latitude}, {longitude})")

Parsed coordinates: (27.9881, 86.925)


## Exercises

1. Create a string representing the name of a city. Convert the string to lowercase and then to uppercase.
2. Take a string with the format 'latitude, longitude' (e.g., '40.7128N, 74.0060W') and extract the numeric values of latitude and longitude.
3. Create a formatted string that includes the name of a location and its coordinates. Use both the `format()` method and f-strings to achieve this.
4. Replace a substring in the name of a place (e.g., change 'San Francisco' to 'San Diego') and print the result.

In [16]:
# Name of a city

city = "Bogota"
print(city.upper())
print(city.lower())


BOGOTA
bogota


In [22]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

In [19]:
help(str.split)

Help on method_descriptor:

split(self, /, sep=None, maxsplit=-1) unbound builtins.str method
    Return a list of the substrings in the string, using sep as the separator string.
    
      sep
        The separator used to split the string.
    
        When set to None (the default value), will split on any whitespace
        character (including \n \r \t \f and spaces) and will discard
        empty strings from the result.
      maxsplit
        Maximum number of splits.
        -1 (the default value) means no limit.
    
    Splitting starts at the front of the string and works to the end.
    
    Note, str.split() is mainly useful for data that has been intentionally
    delimited.  With natural text that includes punctuation, consider using
    the regular expression module.



In [23]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

In [20]:
# Check if 'split' is a built-in function
print('split' in globals())  # Output: False

False


In [18]:
coordinates_n = ('40.7125N, 74.006W')
type(split())
#lat, lon = split(   )


NameError: name 'split' is not defined

## Conclusion

String operations are crucial in geospatial programming, especially when dealing with textual geographic data. Mastering these operations will enable you to handle and manipulate geographic information effectively in your projects.