# This notebook covers  
1. Operations with strings
2. Pattern matching with regular expressions
3. Web scrapping (requests, urllib, Beautiful Soup)
4. Data Serialization (simplejson and pickle)
5. Input/Output (file-systems and database systems)

## 1. [Operations with Strings](https://docs.python.org/3/library/string.html)   

A string is a sequence of characters. 
All the string methods always return new values and do not change or manipulate the original string.  


In [None]:
text="""
A string is a sequence of characters. 
All the string Methods always return NEW values and do not change or manipulate the original string.
"""
word='characters'

print(word.center(50,'.'))
print("Occurences of word 'string' in the text : {}".format(text.count('string')))
print("First instance (index) of word 'string' in the text: {}".format(text.find('string')))
print('text swapped cases: {}'.format(text.swapcase()))
print('Capitalize every first letter of words in text: {}'.format(text.title()))
print('***... Sentence ..$%\n'.strip('*.%$'))
print("another sentence\n".zfill(50))
print('***... Sentence ..$%'+' another sentence\n')
print(text[text.index('string',15):text.find('the',text.index('string',15))])

#### other string methods include: 
- capitalize( ) - capitalizes only the first character of a string.  
- upper() - capitalizes all the letters of the string.  
- split() - returns a list of words in a string where default separator is any whitespace.  
- startswith() - returns True if the string starts with the specified value; otherwise, it returns False.  
- endswith() - returns True if the string endswith the specified value, else it returns False.  
- ljust() - returns a left-justified version of the given string using a specified character, whitespace being default.   
- rjust() - aligns the string to the right.  
- strip() - returns a copy of the string with the leading and trailing characters removed. Default character to be removed is whitespace.  
- zfill() - adds zeros(0) at the beginning of the string. The length of the returned string depends on the width provided.  

## 2. Pattern matching with [regular expressions](https://docs.python.org/3/library/re.html)  

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.  
There are various characters, which would have special meaning when they are used in regular expression. To avoid any confusion while dealing with regular expressions, we would use Raw Strings as **r'expression'.**

In [None]:
import re

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
   print ("matchObj.group() : ", matchObj.group())
   print ("matchObj.group(1) : ", matchObj.group(1))
   print ("matchObj.group(2) : ", matchObj.group(2))
else:
   print ("No match!!")


In [None]:
searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)

if searchObj:
   print ("searchObj.group() : ", searchObj.group())
   print ("searchObj.group(1) : ", searchObj.group(1))
   print ("searchObj.group(2) : ", searchObj.group(2))
else:
   print ("Nothing found!!")

In [None]:
matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
   print ("match --> matchObj.group() : ", matchObj.group())
else:
   print ("No match!!")

searchObj = re.search( r'dogs', line, re.M|re.I)
if searchObj:
   print ("search --> searchObj.group() : ", searchObj.group())
else:
   print ("Nothing found!!")

In [None]:
#Search and Replace
phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print ("Phone Num : ", num)

# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print ("Phone Num : ", num)

## 3. Web Scrapping (Data Mining)
### Using requests module  


In [None]:
https://api.github.com/repos/psf/requestsr = requests.get('https://api.github.com/repos/psf/requests')

In [None]:
r.json()

In [None]:
r.headers

## 4. Data Serialization   

### [simplejson](https://simplejson.readthedocs.io/en/latest/)   

simplejson is a simple, fast, complete, correct and extensible [JSON](http://json.org/) encoder and decoder. 
> The encoder can be specialized to provide serialization in any kind of situation, without any special support by the objects to be serialized (somewhat like pickle). This is best done with the default kwarg to dumps.  
The decoder can handle incoming JSON strings of any specified encoding (UTF-8 by default).  



In [None]:
import simplejson as json

data={'3': 5, '1': 7,'fruits':['passion', 'strawberry', 'mangoe','plum']}

print(json.dumps(data, sort_keys=True, indent=4 * ' '))

with open('json_data.json','w') as json_in:
    json.dump(data,json_in)
    
with open('json_data.json','r') as json_out:
    json_data=json.load(json_out)
    
print('JSON data: {}'.format(json_data))

### [pickle](https://docs.python.org/3/library/pickle.html)   

Pickling (serialization with pickle) is the act of translating python data objects into a format that can be transferred from RAM to disk. pickled Python objects can be easily unpickled back to their original form (deserialisation).  
Although JSON format is human-readable, language-independent, and faster than pickle, only a limited subset of Python built-in types can be represented by JSON. With Pickle, we can easily serialize a very large spectrum of Python types, and, importantly, custom classes. This means we don't need to create a custom schema (like we do for JSON) and write error-prone serializers and parsers. All of the heavy liftings is done for you with Pickle.  

#### What can be Pickled and Unpickled
- All native datatypes supported by Python (booleans, None, integers, floats, complex numbers, strings, bytes, byte arrays)
- Dictionaries, sets, lists, and tuples - as long as they contain pickleable objects
- Functions and classes that are defined at the top level of a module  

> It is important to noter that pickling is not a language-independent serialization method, therefore your pickled data can only be unpickled using Python. Moreover, it's important to make sure that objects are pickled using the same version of Python that is going to be used to unpickle them. Mixing Python versions, in this case, can cause many problems.  
Additionally, functions are pickled by their name references, and not by their value. The resulting pickle does not contain information on the function's code or attributes. Therefore, you have to make sure that the environment where the function is unpickled is able to import the function. In other words, if we pickle a function and then unpickle it in an environment where it's either not defined or not imported, an exception will be raised.  
#### It is also very important to note that pickled objects can be used in malevolent ways. For instance, unpickling data from an untrusted source can result in the execution of a malicious piece of code.

In [None]:
#pickling (serializing)

import pickle

fruits = ['passion', 'strawberry', 'mangoe','plum']

with open('fruits_pickle.pkl', 'wb') as pickle_out:
    pickle.dump(fruits, pickle_out)

In [None]:
#unpickling (deserialization)

with open('fruits_pickle.pkl', 'rb') as pickle_in:
    unpickled_data = pickle.load(pickle_in)

print(unpickled_data)

In [None]:
#Pickling and Unpickling Custom Objects

class NeuralNetwork():
    def __init__(self):
        self.activated = False
    def activate(self):
        self.activated = True
    def set_traing_data(self,data):
        self.traing_data=data
    

model = NeuralNetwork()
model.activate()
model.traing_data=unpickled_data

with open('object_pickle', 'wb') as pickle_out:
    pickle.dump(model, pickle_out)

with open('object_pickle', 'rb') as pickle_in:
    unpickled_object = pickle.load(pickle_in)

print('Activated: {} \nTraining Data: {}'.format(unpickled_object.activated,unpickled_object.traing_data))

**Remember, that we can only unpickle the object in an environment where the class NeuralNetwork is either defined or imported. If we create a new script and try to unpickle the object without importing the NeuralNetwork class, we'll get an "AttributeError".**

## 5. Input and Output (IO)  

### load from and save to file systems  


### IO operations using database systems  